gpt4 book ai didi

java - XPath 代码创建 IOException

转载 作者:行者123 更新时间:2023-12-01 11:07:26 24 4
gpt4 key购买 nike

我有一个文本文档,其中每一行都是完整的美国专利 XML 文档。我正在尝试解析它以删除某些功能,例如专利号等。我以前没有使用过 XPath,所以我借用了从 Ravi Thapliyal 找到的一些代码在Parse XML Simple String using Java XPath 。但是,显然初始 !DOCTYPE 标记导致 DocumentBuilder 尝试在某处查找实际文档?

这是我第一次尝试编写代码:

//convert entire file to ArrayList of strings
ArrayList<String> doc = new ArrayList<>();
while(input.hasNext()){
doc.add(input.nextLine().trim());
}

int index = 0;
while(index < doc.size()){
String xml = doc.get(index);
XPathFactory xpathFactory = XPathFactory.newInstance();
XPath xPath = xpathFactory.newXPath();
InputSource source = new InputSource(new StringReader(xml));

db.setEntityResolver(new EntityResolver() {
public InputSource resolveEntity(String publicId, String systemId)
throws SAXException, java.io.IOException {
if (systemId.contains("us-patent-grant-v40-2004-12-02.dtd")) {
return new InputSource(new StringReader(""));
} else {
return null;
}
}
});

String orgName = "";
try {
orgName = (String) xPath.evaluate("/agents/adressbook/orgname", source,XPathConstants.STRING);
} catch (Exception e) {
e.printStackTrace();
}

System.out.println("Document #" + index + " Company: " + orgName);
}//end while loop that goes through each line (patent document) in file

输入文件中每行的开头在 DOCTYPE 声明之后包含以下内容:us-patent-grant 系统“us-patent-grant-v40-2004-12-02.dtd”[ ]>

导致问题 (91) 的行是:

orgName = (String) xPath.evaluate("/agents/adressbook/orgname", 
source,XPathConstants.STRING);

堆栈跟踪是:

java.io.FileNotFoundException: C:\Users\Dave\Documents\NetBeansProjects\ParseXML\us-patent-grant-v40-2004-12-02.dtd (The system cannot find the file specified)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.<init>(FileInputStream.java:131)
at java.io.FileInputStream.<init>(FileInputStream.java:87)
at sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:90)
at sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:188)
at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.setupCurrentEntity(XMLEntityManager.java:616)
Document #0 Company:
at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startEntity(XMLEntityManager.java:1293)
at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startDTDEntity(XMLEntityManager.java:1260)
at com.sun.org.apache.xerces.internal.impl.XMLDTDScannerImpl.setInputSource(XMLDTDScannerImpl.java:263)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.dispatch(XMLDocumentScannerImpl.java:1164)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.next(XMLDocumentScannerImpl.java:1050)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(XMLDocumentScannerImpl.java:938)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:606)
at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:117)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:510)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:848)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:777)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141)
at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:243)
at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:348)
at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(XPathImpl.java:466)
at Parser.main(Parser.java:102)
--------------- linked to ------------------
javax.xml.xpath.XPathExpressionException: java.io.FileNotFoundException: C:\Users\Dave\Documents\NetBeansProjects\ParseXML\us-patent-grant-v40-2004-12-02.dtd (The system cannot find the file specified)
at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(XPathImpl.java:473)
at Parser.main(Parser.java:102)
Caused by: java.io.FileNotFoundException: C:\Users\Dave\Documents\NetBeansProjects\ParseXML\us-patent-grant-v40-2004-12-02.dtd (The system cannot find the file specified)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.<init>(FileInputStream.java:131)
at java.io.FileInputStream.<init>(FileInputStream.java:87)
at sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:90)
at sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:188)
at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.setupCurrentEntity(XMLEntityManager.java:616)
at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startEntity(XMLEntityManager.java:1293)
at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startDTDEntity(XMLEntityManager.java:1260)
at com.sun.org.apache.xerces.internal.impl.XMLDTDScannerImpl.setInputSource(XMLDTDScannerImpl.java:263)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.dispatch(XMLDocumentScannerImpl.java:1164)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.next(XMLDocumentScannerImpl.java:1050)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(XMLDocumentScannerImpl.java:938)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:606)
at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:117)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:510)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:848)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:777)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141)
at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:243)
at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:348)
at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(XPathImpl.java:466)

有人可以帮我弄清楚我应该做什么来解析字符串中的文档吗?

最佳答案

尝试设置功能或提供空的 EntityResolver

对于功能,您需要找到您使用的解析器实现(它们是特定于实现的)

Make DocumentBuilder.parse ignore DTD references

关于java - XPath 代码创建 IOException,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/32801929/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com