gpt4 book ai didi

c# - Visual Studio 可以将 U+20000 Unicode 作为字符处理吗?如何?

转载 作者:行者123 更新时间:2023-11-30 20:48:49 25 4
gpt4 key购买 nike

有些 Unicode 码多于一个字节,visual studio 能处理这些字符吗?怎么办?

http://www.unicode.org下面为 CJK 发布。现在一个字符可以超过一个字节。

  • CJK 统一表意文字扩展 B(U+20000 到 U+2A6D6)
  • CJK 统一表意文字扩展 C(U+2A700 到 U+2B734)
  • CJK 统一表意文字扩展 D(U+2B740 到 U+2B81D)
  • CJK 兼容性表意文字补充(U+2F800 到 U+2FA1D)

我在 Visual Studio 2012 上的以下声明失败了:

char ch = '\u2A6D6';

我还没有尝试过 visual Studio 2013/Visual Studio 2015。

最佳答案

此代码点不适合 char,因为 char 只有 16 位,因此仅支持最多 65535 个代码点。基本多语言平面 (BMP) 之外的字符可以编码为两个 UTF-16 代码-使用代理对的字符串中的单位。

char.ConvertFromUtf32(0x2A6D6) 返回包含两个 char 的字符串,"\uD869\uDED6"


Code points U+10000 to U+10FFFF

Code points from the other planes (called Supplementary Planes) are encoded in UTF-16 by pairs of 16-bit code units called surrogate pairs, by the following scheme:

  • 0x010000 is subtracted from the code point, leaving a 20 bit number in the range 0..0x0FFFFF.
  • The top ten bits (a number in the range 0..0x03FF) are added to 0xD800 to give the first code unit or lead surrogate, which will be in the range 0xD800..0xDBFF. (Previous versions of the Unicode Standard referred to these as high surrogates.)
  • The low ten bits (also in the range 0..0x03FF) are added to 0xDC00 to give the second code unit or trail surrogate, which will be in the range 0xDC00..0xDFFF. (Previous versions of the Unicode Standard referred to these as low surrogates.)

来自 wikipedia - UTF-16

关于c# - Visual Studio 可以将 U+20000 Unicode 作为字符处理吗?如何?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/24192035/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com