我通过-6ren">
gpt4 book ai didi

java - 声明 ENTITY 将 nbsp 定义为字符串 " "

转载 作者:行者123 更新时间:2023-12-01 09:57:59 28 4
gpt4 key购买 nike

我有一个 HTML 文档,需要通过 XSL 进行转换。HTML 文档包括   的用法即,

ation.</span>&nbsp;</p><br/>All ...

首先,我遇到了麻烦,因为没有定义。所以我定义了它:

<?xml version=\"1.0\"?>
<!DOCTYPE html [
<!ENTITY nbsp "&#160;">
"]>

我通过将该代码添加到 HTML 字符串之前,然后再将其发送到转换来实现此目的。转换之后,ENTITY 声明就消失了,是的,太棒了,转换实际上成功了。

但是!由于 nbsp 被定义为空格,因此生成的 HTML/XML 看到字符串 " " 实际上被空格字符替换。

这不是我想要的。我需要结果的这一部分与源代码不不同。

所以,我尝试重新定义 nbsp,如下所示:

<?xml version=\"1.0\"?>
<!DOCTYPE html [
<!ENTITY nbsp "&amp;nbsp;">
"]>

但是,现在我在结果中看到的不是空格,而是字符 "&nbsp;"

如果我尝试这个:

<?xml version=\"1.0\"?>
<!DOCTYPE html [
<!ENTITY nbsp "&nbsp;">
"]>

我收到递归声明异常。

如何在定义中包含特殊字符“&”?

p.s.,这个转换我在 Java 8 中运行,默认引擎(我猜那是 xalan?)。

谢谢大家!

下面是如何重现的简短示例。很抱歉没有早点提供。

<小时/>
package com.astraia.app.mainframe;

import java.io.*;
import javax.xml.transform.*;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;

public class ShortExample
{
public static void main(String[] args)
{
StringBuffer htmlMain = new StringBuffer(500);
htmlMain .append("<html><head></head>")
.append(" <body>)")
.append(" <p data-tags=\"personal\"><strong>name: Nerea Morry, Id: 5678</strong><br/></p>")
.append(" <p><span>some text</span>&nbsp;</p><br/>some more text")
.append(" </body>")
.append("</html>");

StringBuffer xsl = new StringBuffer(500);
xsl .append("<?xml version=\"1.0\" encoding=\"UTF-8\"?>")
.append("<xsl:stylesheet xmlns:xsl=\"http://www.w3.org/1999/XSL/Transform\" version=\"1.0\">")
.append(" <xsl:output method=\"xml\" version=\"1.0\" encoding=\"UTF-8\" omit-xml-declaration=\"yes\" />")
.append(" <xsl:template match=\"node()|@*\" >")
.append(" <!-- Copy all nodes -->")
.append(" <xsl:copy>")
.append(" <xsl:apply-templates select=\"node()|@*\" />")
.append(" </xsl:copy>")
.append(" </xsl:template>")
.append(" <!-- Anonymize all text within tags indicated as personal -->")
.append(" <xsl:template match=\"*[@data-tags = 'personal' ]//text()[normalize-space(.) != '']\">ANONYMIZED TEXT</xsl:template>")
.append(" </xsl:stylesheet>");

String plainHtml = htmlMain.toString();
String transformation = xsl.toString();

// results in &nbsp being replaced by a space
printResult("results in &nbsp being replaced by a space", plainHtml,"&#160;", transformation);
// results in seemingly non-replaced escape code &amp;
printResult("results in seemingly non-replaced escape code &amp;", plainHtml,"&amp;nbsp", transformation);
// results in recursion exception
printResult("results in recursion exception", plainHtml,"&nbsp;", transformation);
// also results in recursion exception
printResult("also results in recursion exception", plainHtml,"&#038;nbsp;", transformation);

// but what will result in:
// <html><head/> <body>) <p data-tags="personal"><strong>ANONYMIZED TEXT</strong><br/></p> <p><span>some text</span>&nbsp</p><br/>some more text </body></html>
// ?
}

public static void printResult(String message, String plainHtml, String definition, String transformation) {
System.out.print(message);
System.out.println(performTransformation(plainHtml,definition, transformation));
System.out.println("\n-----");
}

public static String performTransformation(String plainHtml, String definition, String transformation)
{
String retval = null;

try {
StringWriter result = new StringWriter();
StringBuffer header = new StringBuffer(100);
header .append("<?xml version=\"1.0\"?>")
.append("<!DOCTYPE html [")
.append(" <!ENTITY nbsp REPLACE_ME>")
.append("]>\n");

String headerText = header.toString().replace("REPLACE_ME", "\"" + definition + "\"");
String wholeText = new StringBuffer(headerText).append(plainHtml).toString();

TransformerFactory factory = TransformerFactory.newInstance();
Source xslt = new StreamSource(new StringReader(transformation));
Transformer transformer = factory.newTransformer(xslt);
Source text = new StreamSource(new StringReader(wholeText));
transformer.transform(text, new StreamResult(result));
retval = result.toString();
}
catch (Exception e) {
System.out.println(e.getMessage());
}

return retval;
}
}
<小时/>

这是我运行小示例应用程序的输出:

results in &nbsp being replaced by a space<html><head/> <body>)     <p data-tags="personal"><strong>ANONYMIZED TEXT</strong><br/></p>       <p><span>some text</span> </p><br/>some more text   </body></html>

-----
results in seemingly non-replaced escape code &amp;<html><head/> <body>) <p data-tags="personal"><strong>ANONYMIZED TEXT</strong><br/></p> <p><span>some text</span>&amp;nbsp</p><br/>some more text </body></html>

-----
results in recursion exceptionjavax.xml.transform.TransformerException: com.sun.org.apache.xml.internal.utils.WrappedRuntimeException: Recursive entity reference "nbsp". (Reference path: nbsp -> nbsp -> nbsp),
null
ERROR: 'Recursive entity reference "nbsp". (Reference path: nbsp -> nbsp -> nbsp),'
-----
ERROR: 'com.sun.org.apache.xml.internal.utils.WrappedRuntimeException: Recursive entity reference "nbsp". (Reference path: nbsp -> nbsp -> nbsp),'

also results in recursion exceptionERROR: 'Recursive entity reference "nbsp". (Reference path: nbsp -> nbsp -> nbsp),'
ERROR: 'com.sun.org.apache.xml.internal.utils.WrappedRuntimeException: Recursive entity reference "nbsp". (Reference path: nbsp -> nbsp -> nbsp),'
javax.xml.transform.TransformerException: com.sun.org.apache.xml.internal.utils.WrappedRuntimeException: Recursive entity reference "nbsp". (Reference path: nbsp -> nbsp -> nbsp),
null

-----

4 次尝试之间的区别是:

</span> </p><br/>some more text

</span>&amp;nbsp</p><br/>some more text

exception

exception

最佳答案

我相信您有两个选择:

  1. 将输出方式更改为html;
    这会将任何不间断空格输出为  

  2. 将输出编码更改为ASCII
    这会将任何不间断空格输出为  

<小时/>

注意:如果将输出方法保留为 xml 并将编码保留为 UTF-8,则序列化结果仍应包含 < em>未转义不间断空格。您的处理链中可能有其他东西阻止了这种情况的发生 - 或者您可能将字符误认为是常规空格(毕竟,在大多数情况下它们呈现为相同的)。

关于java - 声明 ENTITY 将 nbsp 定义为字符串 "&nbsp;",我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37032787/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com