gpt4 book ai didi

java - URI 中的非法字符

转载 作者:行者123 更新时间:2023-12-01 18:18:49 26 4
gpt4 key购买 nike

java.net.URI 构造函数接受大多数非 ASCII 字符,但不接受 ideographic space (0x3000)。 ctor 失败并出现 java.net.URISyntaxException: Illegal character in path ...

所以我的问题是:

  • 为什么 URI 构造函数不接受 0x3000 但接受其他非 ASCII 字符?
  • 它不接受其他哪些字符?

最佳答案

JavaDoc documentation for java.net.URI 中详细说明了可接受的字符集。

Character categories

RFC 2396 specifies precisely which characters are permitted in the various components of a URI reference. The following categories, most of which are taken from that specification, are used below to describe these constraints:

  • alpha The US-ASCII alphabetic characters, 'A' through 'Z' and 'a' through 'z'
  • digit The US-ASCII decimal digit characters, '0' through '9'
  • alphanum All alpha and digit characters unreserved All alphanum characters together with those in the string "_-!.~'()*"
  • punct The characters in the string ",;:$&+="
  • reserved All punct characters together with those in the string "?/[]@"
  • escaped Escaped octets, that is, triplets consisting of the percent character ('%') followed by two hexadecimal digits ('0'-'9', 'A'-'F', and 'a'-'f')
  • other The Unicode characters that are not in the US-ASCII character set, are not control characters (according to the Character.isISOControl method), and are not space characters (according to the Character.isSpaceChar method) (Deviation from RFC 2396, which is limited to US-ASCII)

The set of all legal URI characters consists of the unreserved, reserved, escaped, and other characters.

特别是,“other”包括空格字符,这些字符被定义(由Character.isSpaceChar)为Unicode通用类别类型

  • SPACE_SEPARATOR
  • LINE_SEPARATOR
  • PARAGRAPH_SEPARATOR

根据您在问题中链接到的页面,表意空格字符确实是这些类型之一。

关于java - URI 中的非法字符,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/28147818/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com