gpt4 book ai didi

java - 在 Java 中编码可变长度的 utf8 字节数组

转载 作者:塔克拉玛干 更新时间:2023-11-03 03:20:21 25 4
gpt4 key购买 nike

实际上,我需要读取一个 utf8 格式的字符串,但它的字符使用 variable-length encoding所以我在将它们编码为字符串时遇到问题,打印时出现奇怪的字符,这些字符似乎是韩语,这是我使用但没有结果的代码:

public static String byteToUTF8(byte[] bytes) {
try {
return (new String(bytes, "UTF-8"));

} catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
Charset UTF8_CHARSET = Charset.forName("UTF-8");
return new String(bytes, UTF8_CHARSET);
}

我还使用了 UTF-16 并获得了更好的结果,但是它给了我奇怪的字符,根据上面提供的文档,我应该使用 utf8。

在此先感谢您的帮助。

编辑:

Base64 值:S0QtOTI2IEdHMDA2AAAAAA==\n enter image description here

最佳答案

蓝牙名称显示问题:

如果你检查蓝牙适配器 setName(),你会得到它

https://developer.android.com/reference/android/bluetooth/BluetoothAdapter.html#setName

Valid Bluetooth names are a maximum of 248 bytes using UTF-8 encoding, although many remote devices can only display the first 40 characters, and some may be limited to just 20.

Android 支持的版本:

如果您检查链接https://stackoverflow.com/a/7989085/2293534 ,您将获得 android 支持的版本列表。

Supported and Non supported locales are given in the table:

-----------------------------------------------------------------------------------------------------
| DEC Korean | Korean EUC | ISO-2022-KR | KSC5601/cp949 | UCS-2/UTF-16 | UCS-4 | UTF-8 |
-----------------------------------------------------------------------------------------------------
DEC Korean | - | Y | N | Y | Y | Y | Y |
-----------------------------------------------------------------------------------------------------
Korean EUC | Y | - | Y | N | N | N | N |
-----------------------------------------------------------------------------------------------------
ISO-2022-KR | N | Y | - | Y | N | N | N |
-----------------------------------------------------------------------------------------------------
KSC5601/cp949| Y | N | Y | - | Y | Y | Y |
-----------------------------------------------------------------------------------------------------
UCS-2/UTF-16| Y | N | N | Y | - | Y | Y |
-----------------------------------------------------------------------------------------------------
UCS-4 | Y | N | N | Y | Y | - | Y |
-----------------------------------------------------------------------------------------------------
UTF-8 | Y | N | N | Y | Y | Y | - |
-----------------------------------------------------------------------------------------------------

对于解决方案,

解决方案#1:

Michael为转换提供了一个很好的例子。更多可以查看https://stackoverflow.com/a/40070761/2293534

When you call getBytes(), you are getting the raw bytes of the string encoded under your system's native character encoding (which may or may not be UTF-8). Then, you are treating those bytes as if they were encoded in UTF-8, which they might not be.

A more reliable approach would be to read the ko_KR-euc file into a Java String. Then, write out the Java String using UTF-8 encoding.

InputStream in = ...
Reader reader = new InputStreamReader(in, "ko_KR-euc"); // you can use specific korean locale here
StringBuilder sb = new StringBuilder();
int read;
while ((read = reader.read()) != -1){
sb.append((char)read);
}
reader.close();

String string = sb.toString();

OutputStream out = ...
Writer writer = new OutputStreamWriter(out, "UTF-8");
writer.write(string);
writer.close();

N.B: You should, of course, use the correct encoding name

解决方案#2:

使用StringUtils,你可以做到 https://stackoverflow.com/a/30170431/2293534

解决方案#3:

您可以使用 Apache Commons IO 进行转换。这里给出了一个很好的例子:http://www.utdallas.edu/~lmorenoc/research/icse2015/commons-io-2.4/examples/toString_49.html

1 String resource;
2 //getClass().getResourceAsStream(resource) -> the <code>InputStream</code> to read from
3 //"UTF-8" -> the encoding to use, null means platform default
4 IOUtils.toString(getClass().getResourceAsStream(resource),"UTF-8");

资源链接:

  1. Korean Codesets and Codeset Conversion
  2. Korean Localization
  3. Changing the Default Locale
  4. Byte Encodings and Strings

关于java - 在 Java 中编码可变长度的 utf8 字节数组,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40405728/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com