gpt4 book ai didi

java - TransformerFactory 破坏 标签内的 <input> 和
标签

转载 作者:行者123 更新时间:2023-11-29 04:49:37 25 4
gpt4 key购买 nike

通过简单的代码解析和重写简单的 xml,会发生一些奇怪的事情

输入:

<html>
<input></input>
</html>

给出输出(格式不正确):

<html>
<input>
</html>

同样的事情发生在 <input/> 或

它不会出现在 里面,和其他标签,...

代码很经典:

// READ XML
DocumentBuilderFactory builderFactory =DocumentBuilderFactory.newInstance();
builderFactory.setNamespaceAware(true);
DocumentBuilder builder = builderFactory.newDocumentBuilder();

// PARSE
Document document = builder.parse(new InputSource(new StringReader(_xml_source)));

// WRITE XML

TransformerFactory transFactory = TransformerFactory.newInstance();
Transformer transformer = transFactory.newTransformer();
StringWriter buffer = new StringWriter();
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
transformer.transform(new DOMSource(document), new StreamResult(buffer));
String output = buffer.toString();

这是一个已知错误吗?

最佳答案

XSLT 定义了一个 output method , 可以是 xml , html , 或 text .

规范说默认的输出方式应该是html如果根节点是 <html> , 否则应该是 xml .

随着 xml方法,你会得到<input/> .

随着 html方法,你会得到<input> , 因为 HTML specification这么说

如果需要,您可以明确给出输出方法:

transformer.setOutputProperty(OutputKeys.METHOD, "xml");

这样一个带有 <html> 的文档根节点将输出 XML,即 <input/> .

引述

XSLT output method :

The default for the method attribute is chosen as follows. If

  • the root node of the result tree has an element child,
  • the expanded-name of the first element child of the root node (i.e. the document element) of the result tree has local part html (in any combination of upper and lower case) and a null namespace URI, and
  • any text nodes preceding the first element child of the root node of the result tree contain only whitespace characters,

then the default output method is html; otherwise, the default output method is xml. The default output method should be used if there are no xsl:output elements or if none of the xsl:output elements specifies a value for the method attribute.

HTML empty tags :

Some HTML element types have no content. For example, the line break element BR has no content; its only role is to terminate a line of text. Such empty elements never have end tags. The document type definition and the text of the specification indicate whether an element type is empty (has no content) or, if it can have content, what is considered legal content.

关于java - TransformerFactory 破坏 <html> 标签内的 &lt;input&gt; 和 <br> 标签,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35898677/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com