gpt4 book ai didi

java - 设置后更改InputStream字符集

转载 作者:行者123 更新时间:2023-12-01 18:31:59 25 4
gpt4 key购买 nike

如果一串数据包含不同编码的字符,是否有办法在创建输入流后更改字符集编码或如何实现的建议?

帮助解释的示例:

// data need to read first 4 characters using UTF-8 and next 4 characters using ISO-8859-2?
String data = "testўёѧẅ"
// use default charset of platform, could pass in a charset
try (InputStream in = new ByteArrayInputStream(data.getBytes())) {
// probably an input stream reader to use char instead of byte would be clearer but hopefully the idea comes across
byte[] bytes = new byte[4];
while (in.read(bytes) != -1) {
// TODO: change the charset here to UTF-8 then read values

// TODO: change the charset here to ISO-8859-2 then read values
}
}

一直在研究解码器,可能是可行的方法:

尝试使用相同的输入流:

String data = "testўёѧẅ";
InputStream inputStream = new ByteArrayInputStream(data.getBytes());
Reader r = new InputStreamReader(inputStream, "UTF-8");
int intch;
int count = 0;
while ((intch = r.read()) != -1) {
System.out.println((char)ch);
if ((++count) == 4) {
r = new InputStreamReader(inputStream, Charset.forName("ISO-8859-2"));
}
}

//输出测试而不是第二部分

最佳答案

假设您知道流中将有 n 个 UTF-8 字符和 m 个 ISO 8859-2 字符(示例中的 n=4,m=4 ),您可以通过使用两个不同的 InputStreamReader 处理同一个 InputStream 来实现:

try (InputStream in = new ByteArrayInputStream(data.getBytes())) {
InputStreamReader inUtf8 = new InputStreamReader(in, StandardCharsets.UTF_8);
InputStreamReader inIso88592 = new InputStreamReader(in, Charset.forName("ISO-8859-2"));


// read `n` characters using inUtf8, then read `m` characters using inIso88592
}

请注意,您需要读取字符而不是字节(即检查到目前为止读取了多少个字符,就像在UTF-8中可以对单个字符进行编码一样1-4 字节)。

关于java - 设置后更改InputStream字符集,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60147377/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com