gpt4 book ai didi

java - 如何以流式传输方式迭代巨大 XML 中的节点?

转载 作者:行者123 更新时间:2023-11-30 03:54:10 25 4
gpt4 key购买 nike

我有一个巨大的 XML 文件,如下所示:

<?xml version="1.0"?>
<catalog>
<book id="bk101">
<author>Gambardella, Matthew</author>
<title>XML Developer's Guide</title>
</book>
<book id="bk102">
<author>Ralls, Kim</author>
<title>Midnight Rain</title>
</book>
[... one gazillion more entries ...]
</catalog>

我想以流的方式迭代这个文件,这样我就不必将整个文件加载到内存中,例如:

InputStream stream = new FileInputStream("gigantic-book-list.xml");
String nodeName = "book";
Iterator it = new StreamingXmlIterator(stream, nodeName);
Document bk101 = it.next();
Document bk102 = it.next();

此外,我希望它能够处理不同的 XML 输入文件,而无需创建特定对象(例如 Book.java)。

@McDowell 有一个很有前途的方法,使用 XMLStreamReaderStreamFilter,地址为 https://stackoverflow.com/a/16799693/13365 ,但这仅提取单个节点。

此外,Camel's .tokenizeXML正是我想要的,所以我想我应该查看源代码。

最佳答案

@XmlRootElement
public class Book {
// TODO: getters/setters
public String author;
public String title;
}

假设您希望将数据作为强类型对象进行处理,您可以使用实用程序类型组合 StAX 和 JAXB:

  class ContentFinder implements StreamFilter {
private boolean capture = false;

@Override
public boolean accept(XMLStreamReader xml) {
if (xml.isStartElement() && "book".equals(xml.getLocalName())) {
capture = true;
} else if (xml.isEndElement() && "book".equals(xml.getLocalName())) {
capture = false;
return true;
}
return capture;
}
}

class Limiter extends StreamReaderDelegate {
Limiter(XMLStreamReader xml) {
super(xml);
}

@Override
public boolean hasNext() throws XMLStreamException {
return !(getParent().isEndElement()
&& "book".equals(getParent().getLocalName()));
}
}

用法:

XMLInputFactory inFactory = XMLInputFactory.newFactory();
XMLStreamReader reader = inFactory.createXMLStreamReader(inputStream);
reader = inFactory.createFilteredReader(reader, new ContentFinder());
Unmarshaller unmar = JAXBContext.newInstance(Book.class)
.createUnmarshaller();
Transformer tformer = TransformerFactory.newInstance().newTransformer();
while (reader.hasNext()) {
XMLStreamReader limiter = new Limiter(reader);
Source src = new StAXSource(limiter);
DOMResult res = new DOMResult();
tformer.transform(src, res);
Book book = (Book) unmar.unmarshal(res.getNode());
System.out.println(book.title);
}

关于java - 如何以流式传输方式迭代巨大 XML 中的节点?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/23676373/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com