gpt4 book ai didi

java - Node.getTextContent() 有没有办法获取当前节点的文本内容,而不是后代的文本

转载 作者:搜寻专家 更新时间:2023-10-30 19:41:43 27 4
gpt4 key购买 nike

Node.getTextContent() 返回当前节点及其子节点的文本内容。

有没有办法获取当前节点的文本内容,而不是后代的文本。

例子

<paragraph>
<link>XML</link>
is a
<strong>browser based XML editor</strong>
editor allows users to edit XML data in an intuitive word processor.
</paragraph>

预期输出

paragraph = is a editor allows users to edit XML data in an intuitive word processor.
link = XML
strong = browser based XML editor

我试过下面的代码

String str =            "<paragraph>"+
"<link>XML</link>"+
" is a "+
"<strong>browser based XML editor</strong>"+
"editor allows users to edit XML data in an intuitive word processor."+
"</paragraph>";

org.w3c.dom.Document domDoc = null;
DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder;

try {
docBuilder = docFactory.newDocumentBuilder();
ByteArrayInputStream bis = new ByteArrayInputStream(str.getBytes());
domDoc = docBuilder.parse(bis);
} catch (ParserConfigurationException e1) {
e1.printStackTrace();
} catch (SAXException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}

DocumentTraversal traversal = (DocumentTraversal) domDoc;
NodeIterator iterator = traversal.createNodeIterator(
domDoc.getDocumentElement(), NodeFilter.SHOW_ELEMENT, null, true);

for (Node n = iterator.nextNode(); n != null; n = iterator.nextNode()) {
String tagname = ((Element) n).getTagName();
System.out.println(tagname + "=" + ((Element)n).getTextContent());
}

但是它给出了这样的输出

paragraph=XML is a browser based XML editoreditor allows users to edit XML data in an intuitive word processor.
link=XML
strong=browser based XML editor

注意 paragraph 元素包含 linkstrong 标签的文本,这是我不想要的。请提出一些想法?

最佳答案

你想要的是过滤你节点的 child <paragraph>只保留节点类型为 Node.TEXT_NODE 的节点.

这是一个返回你想要的内容的方法的例子

public static String getFirstLevelTextContent(Node node) {
NodeList list = node.getChildNodes();
StringBuilder textContent = new StringBuilder();
for (int i = 0; i < list.getLength(); ++i) {
Node child = list.item(i);
if (child.getNodeType() == Node.TEXT_NODE)
textContent.append(child.getTextContent());
}
return textContent.toString();
}

在你的例子中它意味着:

String str = "<paragraph>" + //
"<link>XML</link>" + //
" is a " + //
"<strong>browser based XML editor</strong>" + //
"editor allows users to edit XML data in an intuitive word processor." + //
"</paragraph>";
Document domDoc = null;
try {
DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docFactory.newDocumentBuilder();
ByteArrayInputStream bis = new ByteArrayInputStream(str.getBytes());
domDoc = docBuilder.parse(bis);
} catch (Exception e) {
e.printStackTrace();
}
DocumentTraversal traversal = (DocumentTraversal) domDoc;
NodeIterator iterator = traversal.createNodeIterator(domDoc.getDocumentElement(), NodeFilter.SHOW_ELEMENT, null, true);
for (Node n = iterator.nextNode(); n != null; n = iterator.nextNode()) {
String tagname = ((Element) n).getTagName();
System.out.println(tagname + "=" + getFirstLevelTextContent(n));
}

输出:

paragraph= is a editor allows users to edit XML data in an intuitive word processor.
link=XML
strong=browser based XML editor

它所做的是迭代一个节点的所有子节点,只保留文本(因此不包括评论、节点等)并累积它们各自的文本内容。

Node中没有直接方法或 Element只获取第一层的文本内容。

关于java - Node.getTextContent() 有没有办法获取当前节点的文本内容,而不是后代的文本,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/12191414/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com