gpt4 book ai didi

java - 了解 java 字符串中包含的文本是否包含 UTF-8 编码字符的最佳方法

转载 作者:搜寻专家 更新时间:2023-11-01 03:27:25 27 4
gpt4 key购买 nike

有没有其他方法可以知道 java String 是否包含 UTF-8 中的 character-encoding编码与否,例如阿拉伯语单词。

我试过这段代码:但它是否准确并能胜任工作?

char c = 'أ';
int num = (int) c;

if(num> 128)
// then UTF-8 characters exists

最佳答案

(假设 UTF-8 == 非 ASCII)

您可以做的是编码然后解码 ASCII 中的字符串,并将结果与​​原始结果进行比较。如果它们不相等,则存在非 ASCII 字符。

但是,您自己的样本也可以工作(几乎应该是 >= 128 ),因为以下证明确实所有 char s < 128是 ASCII:

To allow backward compatibility, the 128 ASCII and 256 ISO-8859-1 (Latin 1) characters are assigned Unicode/UCS code points that are the same as their codes in the earlier standards.

The first plane (code points U+0000 to U+FFFF) contains the most frequently used characters and is called the Basic Multilingual Plane or BMP. Both UTF-16 and UCS-2 encode valid code points in this range as single 16-bit code units that are numerically equal to the corresponding code points.

(“UTF-16”和“ASCII”,维基百科)

char s 是 UTF-16“代码单元”。


但是,从整个问题来看,您最好还是阅读 The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)首先。

关于java - 了解 java 字符串中包含的文本是否包含 UTF-8 编码字符的最佳方法,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/9825579/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com