gpt4 book ai didi

java - Character可以代表所有的unicode码位吗?

转载 作者:塔克拉玛干 更新时间:2023-11-01 21:49:54 25 4
gpt4 key购买 nike

由于 Java char 是 16 位长,我想知道它如何表示完整的 unicode 代码点?它只能表示 65536 个代码点,对吗?

最佳答案

是的,Java 字符是一个 UTF-16 代码单元。如果您需要在 Basic Multilingual Plane 之外表示 Unicode 字符,则需要在 java.lang.String 中使用代理项对。 String 类提供了多种方法来处理完整的 Unicode 代码点,例如 codePointAt(index)

来自 section 3.1 of the Java Language Specification :

The Unicode standard was originally designed as a fixed-width 16-bit character encoding. It has since been changed to allow for characters whose representation requires more than 16 bits. The range of legal code points is now U+0000 to U+10FFFF, using the hexadecimal U+n notation. Characters whose code points are greater than U+FFFF are called supplementary characters. To represent the complete range of characters using only 16-bit units, the Unicode standard defines an encoding called UTF-16. In this encoding, supplementary characters are represented as pairs of 16-bit code units, the first from the high-surrogates range, (U+D800 to U+DBFF), the second from the low-surrogates range (U+DC00 to U+DFFF). For characters in the range U+0000 to U+FFFF, the values of code points and UTF-16 code units are the same.

The Java programming language represents text in sequences of 16-bit code units, using the UTF-16 encoding. A few APIs, primarily in the Character class, use 32-bit integers to represent code points as individual entities. The Java platform provides methods to convert between the two representations.

参见 Character docs获取更多信息。

关于java - Character可以代表所有的unicode码位吗?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/8768327/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com