gpt4 book ai didi

java - 问题: length changed when convert between String and byte array?

转载 作者:行者123 更新时间:2023-12-02 11:28:37 29 4
gpt4 key购买 nike

我认为在 byte[] 和 String 之间转换时,输出的长度将始终保持相同。但下面的例子表明这是不正确的。

byte[] b1 = {55, -71, -35, -35, 83, -115, 107, -80, -62, 86, 98, 125, -68, -12, 14, -92, -122, -65, -117, -26, 80, -102, 75, 49, -120, -10, 18, -8, 82, -21, 49, 80, 125, 94, -35, -66, 91, 79, 77, -29, -48, -85, 29, -48, -118, -13, -84, -77, 93, -101, -7, 46, -44, -25, -42, 72, -33, -81, -120, -40, 40, 65, 58, -74, -34, 99, -8, -118, 83, 110, -94, 69, 21, -27, 114, 43, -23, 7, 120, -15, 21, 110, 108, 98, -99, 7, 107, 63, -48, 32, 123, 35, -36, -35, 7, -75, 40, -3, 33, 92, -79, 119, 22, -63, 27, 123, -98, 92, -93, 30, 51, 55, 106, -109, 99, 123, 25, -111, -53, 66, 117, 121, -20, 6, -10, -34, -76, -120, -56, 123, 48, -9, -116, -81, -47, 67, 80, 14, -58, -17, -92, -75, 119, 27, 125, -115, -31, 114, -96, 126, -87, 98, -108, -21, -113, 36, 104, -69, -74, 41, -68, 115, 103, 106, -39, 10, 0, 7, -66, 84, -94, 46, -1, -62, -115, 104, -104, 53, 86, -117, 15, -100, 46, 7, 57, -84, 40, 118, -12, 93, -6, -31, 28, 81, -72, 123, 54, -76, 123, 111, 54, 121, 126, -19, -32, 99, 109, -68, -103, 29, 75, 57, 115, 33, 110, -23, -116, 11, 112, 117, 67, -100, 21, 94, -16, 94, 24, 47, -90, -48, 30, 15, 24, 98, -114, -96, 37, -47, 32, 74, 110, 58, 35, 77, 62, -74, 94, 59, 63, -35, -59, 10, 43, 65, -63, 59, -65, 58, 69, 88, -91, -58, -103, 88, 6, -105, 92, -9, -19, 26, 5, -42, -38, -82, -56, 42, -45, 30, 103, -113, -64, -82, 29, 6, 40, 102, 44, 59, 51, -69, -70, 90, -126, 40, -105, 103, 92, 124, 120, 43, -53, 73, -109, 103, -62, -64, -68, -81, -61, -68, -73, -6, -112, 85, 119, -92, -85, -31, -37, 32, -2, 100, 34, 41, -128, 73, -92, -94, 71, 98, 0, 126, -98, -51, -8, -72, -97, 66, -71, -14, -74, -39, 56, 71, 46, -94, 40, 32, -84, -17, -128, 60, 25, 75, -104, 25, 49, -14, -103, -89, 97, -61, 89, -109, 118, 114, 123, -38, 101, 98, 7, 70, 9, 42, 98, -94, 73, -70, 72, 43, 52, -89, -20, -22, -58, -109, -88, 36, 118, 71, -34, -85, -24, -46, -120, -118, 5, -118, -53, -5, -87, -116, -38, 101, 74, -111, -2, 12, 48, -105, -110, 6, -114, 31, 70, -42, -118, -61, 82, 83, -37, 27, -56, 91, 113, -23, -40, -121, 35, 79, 3, 79, 58, -54, -11, -41, -48, -109, -54, 96, 80, 77, -69, -88, -75, -126, -64, 54, 33, 7, 121, 16, -49, 26, 68, 94, 107, -79, -17, -67, -59, 57, -8, -36, 99, 29, -2, 36, -91, 70, 56, 76, 88, 40, 85, -16, 120, -101, -21, 83, 103, -91, 28, 14, 17, 73, -102, -121, 69, -102, 18, -115, -92, -5, -50, -20};
System.out.println("resultBytes length = " + b1.length);

String s = new String(b1, "utf-8");
System.out.println("cipherText length = " + s.length());

byte[] b2 = s.getBytes("utf-8");
System.out.println("newResultBytes length = " + b2.length);

通过运行这个,我得到了输出:

length of b1 = 496
length of s = 470
length of b2 = 877

为什么它们如此不同?

最佳答案

在 UTF-8 编码中,一个字符可能有超过 1 个字节。

示例:

Character -> Codepoints -> UTF-8 Encoding
ä -> 00E4 -> C3 A4

因此输入中的 2 个字节可以在输出中显示为 1 个字符。

现在,在 Unicode 中您可以分解字符(尤其是外语)。因此,为了保持我的示例,字符 ä 可以分解为

¨a

现在有 2 个字符具有以下编码

Character -> Codepoints -> UTF-8 Encoding
¨a -> 00A4 0061 -> C2 A4 61

特别是如果您使用亚洲语言,这种分解会比本例中更频繁地发生。

因此,对于这个示例(当分解发生时,这在每种语言中都不确定),您的程序将得到以下输出:

length of b1 = 2
length of s = 1
length of b2 = 3

我认为这可以解释你的发现。

关于java - 问题: length changed when convert between String and byte array?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49446187/

29 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com