gpt4 book ai didi

java - 输入 XML 中的中文字符导致 XSLT 转换在输出 XML 中产生无效字符引用

转载 作者:行者123 更新时间:2023-12-01 21:59:37 25 4
gpt4 key购买 nike

我试图弄清楚为什么我的简单 XSLT 转换(应该将 XML 转换为 XML)似乎无法实现这一目标。

转换只是复制所有内容:

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:output method="xml" encoding="utf-8" />
<xsl:template match="*|@*">
<xsl:copy>
<xsl:apply-templates select="*|@*|text()" />
</xsl:copy>
</xsl:template>
</xsl:stylesheet>

使用输入 XML 文件,如下所示:

<?xml version="1.0" encoding="utf-8"?>
<foo xmlns="uri:foo">
<name>丕𠀆𠀅𠀍𠁀</name>
</foo>

结果如下:

<?xml version="1.0" encoding="utf-8"?>
<foo xmlns="uri:foo">
<name>丕&#55360;&#56326;&#55360;&#56325;&#55360;&#56333;&#55360;&#56384;</name>
</foo>

我使用的工具都依赖于 (Java) Apache Xalan 2.7.1 XSLT 处理器,包括带有 XSL 开发人员工具插件的 Eclipse (Mars),我在其中创建了此示例。

后一个插件声称输入 XML 格式正确,但输出 XML 格式错误(字符引用 � 是无效的 XML 字符)。

为什么我的 XSLT 处理器会生成无效的 XML?如何防止它这样做?

实际代码与此类似(您的类路径中需要 Xalan):

import java.io.*;
import java.nio.charset.StandardCharsets;
import java.nio.file.*;
import javax.xml.transform.*;
import javax.xml.transform.stream.*;

public class XSLTTest {

private final TransformerFactory xalanTransFact;

public XSLTTest() {
xalanTransFact = new org.apache.xalan.processor.TransformerFactoryImpl();
}

public Templates createCustomTransformation(
File transformation
) throws TransformerException, IOException {
InputStreamReader readerTransformation = null;
try {
readerTransformation = new InputStreamReader(
new FileInputStream(transformation), StandardCharsets.UTF_8);
Templates transformer = xalanTransFact.newTemplates(
new StreamSource(readerTransformation)
);
return transformer;
} catch (TransformerException | IOException ex) {
throw ex;
} finally {
try {
if (readerTransformation != null) {
readerTransformation.close();
}
} catch (IOException ex) {}
}
}

public File applyCustomTransformation(
Transformer transformer, Reader transformeeReader, Path out,
boolean indent
) throws TransformerException, IOException {
Writer writer = null;
try {

File file = out.toFile();
writer = new OutputStreamWriter(new FileOutputStream(file), StandardCharsets.UTF_8);

if (indent) {
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty(
"{http://xml.apache.org/xslt}indent-amount",
String.valueOf(2));
}
transformer.setOutputProperty(OutputKeys.METHOD, "xml");
transformer.setOutputProperty(OutputKeys.ENCODING, "utf-8");

transformer.transform(
new StreamSource(transformeeReader),
new StreamResult(writer));

return file;

} catch (TransformerException | IOException ex) {
throw ex;
} finally {
try {
if (writer != null) {
writer.close();
}
} catch (IOException ex) {}
}
}

private void saveToFile(File selectedFile, String content)
throws FileNotFoundException, IOException {
Writer writer = null;
try {
writer = new OutputStreamWriter(
new FileOutputStream(selectedFile), StandardCharsets.UTF_8);
writer.write(content);
writer.flush();
}
catch (FileNotFoundException ex) {
throw ex;
} catch (IOException ex) {
throw ex;
} finally {
if (writer != null) {
try {
writer.close();
} catch (IOException ex) {
}
}
}
}

public static void main(String[] args) throws IOException, TransformerException {
String xslText = "" +
"<?xml version=\"1.0\" encoding=\"utf-8\"?>\n" +
"<xsl:stylesheet xmlns:xsl=\"http://www.w3.org/1999/XSL/Transform\"\n" +
" version=\"1.0\">\n" +
" <xsl:output method=\"xml\" encoding=\"utf-8\" />\n" +
" <xsl:template match=\"*|@*\">\n" +
" <xsl:copy>\n" +
" <xsl:apply-templates select=\"*|@*|text()\" />\n" +
" </xsl:copy>\n" +
" </xsl:template>\n" +
"</xsl:stylesheet>";

String xmlToParse = "" +
"<?xml version=\"1.0\" encoding=\"utf-8\"?>\n" +
"<foo xmlns=\"uri:foo\">\n" +
" <name>丕𠀆𠀅𠀍𠁀</name>\n" +
"</foo>";

XSLTTest test = new XSLTTest();

Path xsl = Files.createTempFile("test", ".xsl");
test.saveToFile(xsl.toFile(), xslText);
Templates templates = test.createCustomTransformation(xsl.toFile());
Transformer transformer = templates.newTransformer();

Path xml = Files.createTempFile("test-out", ".xml");
StringReader reader = new StringReader(xmlToParse);
test.applyCustomTransformation(transformer, reader, xml, true);

System.out.println("Result is at: " + xml.toString());
}
}

由于某些原因,我无法切换到另一个 XSLT 处理器。

最佳答案

正如 @VGR 在评论中所写,这是错误的表现 https://issues.apache.org/jira/browse/XALANJ-2419 .

对其 JIRA 的评论建议了一种解决方法 - 使用 UTF-16 作为转换的输出编码,而不是 UTF-8,因为该错误仅影响后者。

所以,在我的示例中,行

transformer.setOutputProperty(OutputKeys.ENCODING, "utf-8");

需要替换为

// workaround for https://issues.apache.org/jira/browse/XALANJ-2419
transformer.setOutputProperty(OutputKeys.ENCODING, "utf-16");
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
writer.write("<?xml version=\"1.0\" encoding=\"utf-8\"?>\n");

而其他一切都保持不变。实际文件仍然写为 UTF-8,但转换将在内部处理为 UTF-16。

关于java - 输入 XML 中的中文字符导致 XSLT 转换在输出 XML 中产生无效字符引用,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58713964/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com