gpt4 book ai didi

java - 与 ASCII 不同的编码,即使对于字母也是如此

转载 作者:行者123 更新时间:2023-11-29 04:47:11 24 4
gpt4 key购买 nike

是否有任何字符编码在消费类设备(相对于大型机)上相当常见,并且将字母 A-Za-z0-9 映射为与 ASCII 不同的字符编码?

目前我正在考虑 Java 应用程序,所以我想知道是否有任何机会在某些国家/地区使用某些 Java 软件的临时用户最终可能会得到 defaultCharset。以这样的方式报告 "AZaz09".getBytes()返回不同于 "AZaz09".getBytes("UTF-8") 的内容.我正在尝试弄清楚我是否必须解决某些兼容性问题,这些问题可能由这方面的不同行为导致。

我知道,从历史上看,EBCDIC 是 ASCII 不兼容编码的主要示例。但它是否被用于任何最新的消费设备,或仅用于 IBM 大型机和老式计算机? EBCDIC 的遗产是否存在于某些国家/地区的通用编码中?

我还知道 UTF-16 与 ASCII 不兼容,并且在 Windows 上以这种方式对文件进行编码是很常见的。但据我所知,这始终只是文件内容,而不是默认的应用程序区域设置。用户是否可以将他们的 Windows 机器配置为使用 UTF-16 作为系统代码页,而不会破坏至少一半的应用程序?

据我所知,亚洲使用的所有前 Unicode 多字节编码仍然将 ASCII 范围 00-7F 映射到至少在字母和数字方面与 ASCII 兼容的内容。是否有任何仍在使用的亚洲编码对其代码点使用超过一个字节的所有?或者也许在其他大陆?

最佳答案

这是一个简单的程序,可以找出答案。失败的字符集是否足够常见由您决定。

import java.nio.charset.Charset;
import java.nio.charset.StandardCharsets;
import java.util.Arrays;

public class EncodingTest {
public static void main(String[] args) {
String s = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789";
byte[] b = s.getBytes(StandardCharsets.UTF_8);
for (Charset cs : Charset.availableCharsets().values()) {
try {
byte[] b2 = s.getBytes(cs);
if (!Arrays.equals(b, b2)) {
System.out.println(cs.displayName() + " doesn't give the same result");
}
}
catch (Exception e) {
System.out.println(cs.displayName() + " throws an exception");
}
}
}
}

我机器上的结果是

IBM-Thai doesn't give the same result
IBM01140 doesn't give the same result
IBM01141 doesn't give the same result
IBM01142 doesn't give the same result
IBM01143 doesn't give the same result
IBM01144 doesn't give the same result
IBM01145 doesn't give the same result
IBM01146 doesn't give the same result
IBM01147 doesn't give the same result
IBM01148 doesn't give the same result
IBM01149 doesn't give the same result
IBM037 doesn't give the same result
IBM1026 doesn't give the same result
IBM1047 doesn't give the same result
IBM273 doesn't give the same result
IBM277 doesn't give the same result
IBM278 doesn't give the same result
IBM280 doesn't give the same result
IBM284 doesn't give the same result
IBM285 doesn't give the same result
IBM290 doesn't give the same result
IBM297 doesn't give the same result
IBM420 doesn't give the same result
IBM424 doesn't give the same result
IBM500 doesn't give the same result
IBM870 doesn't give the same result
IBM871 doesn't give the same result
IBM918 doesn't give the same result
ISO-2022-CN throws an exception
JIS_X0212-1990 doesn't give the same result
UTF-16 doesn't give the same result
UTF-16BE doesn't give the same result
UTF-16LE doesn't give the same result
UTF-32 doesn't give the same result
UTF-32BE doesn't give the same result
UTF-32LE doesn't give the same result
x-IBM1025 doesn't give the same result
x-IBM1097 doesn't give the same result
x-IBM1112 doesn't give the same result
x-IBM1122 doesn't give the same result
x-IBM1123 doesn't give the same result
x-IBM1364 doesn't give the same result
x-IBM300 doesn't give the same result
x-IBM833 doesn't give the same result
x-IBM834 doesn't give the same result
x-IBM875 doesn't give the same result
x-IBM930 doesn't give the same result
x-IBM933 doesn't give the same result
x-IBM935 doesn't give the same result
x-IBM937 doesn't give the same result
x-IBM939 doesn't give the same result
x-JIS0208 doesn't give the same result
x-JISAutoDetect throws an exception
x-MacDingbat doesn't give the same result
x-MacSymbol doesn't give the same result
x-UTF-16LE-BOM doesn't give the same result
X-UTF-32BE-BOM doesn't give the same result
X-UTF-32LE-BOM doesn't give the same result

关于java - 与 ASCII 不同的编码,即使对于字母也是如此,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/36688767/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com