gpt4 book ai didi

java - Base64 编码和解码后变音符号在另一个系统上丢失

转载 作者:太空宇宙 更新时间:2023-11-04 10:32:56 26 4
gpt4 key购买 nike

给出以下实现时,我面临的问题是,在另一个系统上,与原始 XML 文件相比,XML 文件缺少元音变音 (ä、ü、ö)。 XML 文件中插入的是替换字符,而不是变音符号。 (0xEF 0xBF 0xBD (efbfbd))

  • 获取包含带有变音符号的 XML 的 zip 文件
  • 解压 zip 文件
  • 将 xml 内容编码为 Base64 负载并将其保存到数据库
  • 查询实体
  • 获取 Base64 负载
  • 解码 Base64 内容
  • 解码后的 Base64 内容是一个 XML,其中应包含源变音符号

让我发疯的是,解码的 Base64 内容在另一个系统上缺少变音符号。我得到的不是元音变音而是替换字符。在我的系统上,相同的实现无需替换即可工作。

以下代码只是一个 MCVE,用于解释在我的系统上运行正常但在其他系统 (Windows Server 2013) 上解码后会丢失变音符号的问题。

String requestUrl = "https://myserver/mypath/Message_166741.zip";    
HttpGet httpget = new HttpGet(String requestUrl = "https://myserver/mypath/Message_166741.zip";);
HttpResponse response = httpClient.execute(httpget);
HttpEntity entity = response.getEntity();
InputStream inputStream = entity.getContent();
byte[] decompressedInputStream = decompress(inputStream);

String content = null;
content = new String(decompressedInputStream, StandardCharsets.UTF_8);
String originFileName = new SimpleDateFormat("yyyyMMddHHmm'_origin.xml'").format(new Date());
String originFileNameWithPath = String.format("C:\\temp\\Tests\\%1$s", originFileName);

// File contains the expected umlauts
FileUtils.writeStringToFile(new File(originFileNameWithPath), content);

String payloadUTF8 = Base64.encodeBase64String(ZipUtils.compress(content.getBytes("UTF-8")));
String payload = Base64.encodeBase64String(ZipUtils.compress(content.getBytes()));
String payloadJavaBase64 = new String(java.util.Base64.getEncoder().encode(ZipUtils.compress(content.getBytes())));

String xmlMessageJavaBase64;
byte[] compressedBinaryJavaBase64 = java.util.Base64.getDecoder().decode(payloadJavaBase64);
byte[] decompressedBinaryJavaBase64= ZipUtils.decompress(compressedBinaryJavaBase64);
xmlMessageJavaBase64 = new String(decompressedBinaryJavaBase64, "UTF-8");

String xmlMessageUTF8;
byte[] compressedBinaryUTF8 = java.util.Base64.getDecoder().decode(payloadUTF8);
byte[] decompressedBinaryUTF8 = ZipUtils.decompress(compressedBinaryUTF8);
xmlMessageUTF8 = new String(decompressedBinaryUTF8, "UTF-8");

String xmlMessage;
byte[] compressedBinary = java.util.Base64.getDecoder().decode(payload);
byte[] decompressedBinary = ZipUtils.decompress(compressedBinary);
xmlMessage = new String(decompressedBinary, "UTF-8");

String processedFileName = new SimpleDateFormat("yyyyMMddHHmm'_processed.xml'").format(new Date());
String processedFileNameUTF8 = new SimpleDateFormat("yyyyMMddHHmm'_processedUTF8.xml'").format(new Date());
String processedFileNameJavaBase64 = new SimpleDateFormat("yyyyMMddHHmm'_processedJavaBase64.xml'").format(new Date());


// These files do not contain the umlauts anymore.
// Instead of the umlauts a replacement character is inserted (0xEF 0xBF 0xBD (efbfbd))
String processedFileNameWithPath = String.format("C:\\temp\\Tests\\%1$s", processedFileName);
String processedFileNameWithPathUTF8 = String.format("C:\\temp\\Tests\\%1$s", processedFileNameUTF8);
String processedFileNameWithPathJavaBase64 = String.format("C:\\temp\\Tests\\%1$s", processedFileNameJavaBase64);
FileUtils.writeStringToFile(new File(processedFileNameWithPath), xmlMessage);
FileUtils.writeStringToFile(new File(processedFileNameWithPathUTF8), xmlMessageUTF8);
FileUtils.writeStringToFile(new File(processedFileNameWithPathJavaBase64), xmlMessageJavaBase64);

这三个文件仅用于测试目的,但我希望您能解决问题

编辑

这两种方法都可以在我的机器上使用 ü、ö、ä 创建 XML 文件仅WITHOUT 实现会在另一个系统上创建带有ü、ö、ä 的XML XML 文件WITH UTF-8 的“内容”字符串包含for ü =>

// WITHOUT UTF-8 IN BYTE[] => STRING CTOR
byte[] dci = decompress(inputStream);
content = new String(dci);

byte[] compressedBinary = java.util.Base64.getDecoder().decode(content);
byte[] decompressedBinary = ZipUtils.decompress(compressedBinary);
String xml = new String(decompressedBinary);


// WITH UTF-8 IN BYTE[] => STRING CTOR
byte[] dci = decompress(inputStream);
content = String(dci, StandardCharsets.UTF_8);;

byte[] compressedBinary = java.util.Base64.getDecoder().decode(content);
byte[] decompressedBinary = ZipUtils.decompress(compressedBinary);
String xml = new String(decompressedBinary, "UTF-8");

编辑#2

在我的机器上在 IntelliJ 中运行代码和在 IntelliJ 之外运行代码似乎也存在差异。不知道这会产生如此巨大的差异。因此,如果我在 IntelliJ 之外运行代码 (java.exe -jar myjarfile),WITH UTF8 部分将替换 Ü。与...我不知道。 Notepad++ 显示 xFC。有趣的是:我的树莓派显示这两个文件都带有 Ü,而我的 Windows/notepad++ 显示 xFC。

整件事让我很困惑,我想知道问题出在哪里。还因为 XML 文件包含 UTF8 作为 header 中的编码。

编辑 #3 最终解决方案

// ## SERVER
// Get ZIP from request URL
HttpGet httpget = new HttpGet(requestUrl);
HttpResponse response = httpClient.execute(httpget);
HttpEntity entity = response.getEntity();
InputStream inputStream = entity.getContent();

byte[] decompressedInputStream = decompress(inputStream);

// Produces a XML string which SHOULD contain ü, ö, ä
String xmlOfZipFileContent = new String(decompressedInputStream, StandardCharsets.UTF_8);

// Just for testing write to file
String xmlOfZipFileSavePath = String.format("C:\\temp\\Tests\\%1$s", new SimpleDateFormat("yyyyMMddHHmm'_original.xml'").format(new Date()));
FileUtils.writeStringToFile(new File(xmlOfZipFileSavePath), xmlOfZipFileContent, StandardCharsets.UTF_8);

// The payloadExplicitUtf8 gets stored into the DB
String payload = java.util.Base64.getEncoder().encodeToString(ZipUtils.compress(xmlOfZipFileContent.getBytes(StandardCharsets.UTF_8)));

// Store payload to db
// Client queries database and gets the payload
// payload = dbEntity.get().payload


// The following three lines is on client
byte[] compressedBinaryPayload = java.util.Base64.getDecoder().decode(payload);
byte[] decompressedBinaryPayload = ZipUtils.decompress(compressedBinaryPayload);
String xmlMessageOutOfPayload = new String(decompressedBinaryPayload, StandardCharsets.UTF_8);

String xmlOfPayloadSavePath = String.format("C:\\temp\\Tests\\%1$s", new SimpleDateFormat("yyyyMMddHHmm'_payload.xml'").format(new Date()));
FileUtils.writeStringToFile(new File(xmlOfPayloadSavePath), xmlMessageOutOfPayload, StandardCharsets.UTF_8);

最佳答案

如果我理解正确的话,您的情况似乎如下:

// Decompress data from the server, it's in ISO-8859-1 or similar 1 byte encoding
byte[] dci = decompress(inputStream);

// Data gets corrupted because of wrong charset
// This is where ü gets converted to unicode replacement character
content = new String(dci, StandardCharsets.UTF_8);

代码的其余部分明确使用UTF8,但这并不重要,因为此时数据已经损坏。最后,您期望得到一个 UTF-8 编码的文件。

<小时/>

Also because the XML file contains the UTF8 as encode in header.

这并不能证明什么。如果您将其视为只是一个文本文件,您可以用任意多种编码将其写出,并且它仍然会声称是 UTF8。

关于java - Base64 编码和解码后变音符号在另一个系统上丢失,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49810626/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com