gpt4 book ai didi

java - 如何检查字符串是否可以某种编码方式编码?

转载 作者:搜寻专家 更新时间:2023-10-31 19:28:49 25 4
gpt4 key购买 nike

以下测试在转换后的 Latin1 上失败,因为非法字符被替换为值为 63(问号)的字节。问题是这些字符应该更好地导致一些异常......

  @Test
public void testEncoding() throws UnsupportedEncodingException {
final String czech = "Řízeček a šampáňo a žízeň";
// okay
final byte[] bytesInLatin2 = czech.getBytes("ISO8859-2");
// different bytes, but okay
final byte[] bytesInWin1250 = czech.getBytes("Windows-1250");
// different bytes, but okay
final byte[] bytesInUtf8 = czech.getBytes("UTF-8");
// nonsense; Ř,č,... are not in Latin1 code set!!!
final byte[] bytesInLatin1 = czech.getBytes("ISO8859-1");

System.out.println(Arrays.toString(bytesInLatin2));
System.out.println(Arrays.toString(bytesInWin1250));
System.out.println(Arrays.toString(bytesInUtf8));
System.out.println(Arrays.toString(bytesInLatin1));
System.out.flush();

final String latin2 = new String(bytesInLatin2, "ISO8859-2");
final String win1250 = new String(bytesInWin1250, "Windows-1250");
final String utf8 = new String(bytesInUtf8, "UTF-8");
final String latin1 = new String(bytesInLatin1, "ISO8859-1");

Assert.assertEquals("latin2", czech, latin2);
Assert.assertEquals("win1250", czech, win1250);
Assert.assertEquals("utf8", czech, utf8);
Assert.assertEquals("latin1", czech, latin1); // this test will fail!
}

在很多情况下,由于 Java 的这种行为,数据最终会被破坏。如果字符串可通过某种编码进行编码,是否有任何库可用于验证字符串?

最佳答案

我怀疑您正在寻找 CharsetEncoder.canEncode(CharSequence) .

Charset latin2 = Charset.forName("ISO8859-2");
boolean validInLatin2 = latin2.newEncoder().canEncode(czech);
...

关于java - 如何检查字符串是否可以某种编码方式编码?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/16902006/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com