gpt4 book ai didi

java - JTidy 升级破坏了文档 xpaths

转载 作者:塔克拉玛干 更新时间:2023-11-02 08:29:30 25 4
gpt4 key购买 nike

我刚刚更新到 10 月份发布的最新版本的 jtidy,它似乎由于未知原因破坏了我的文档对象。这是我的代码:

tidy = new Tidy();
tidy.setShowWarnings(false);
tidy.setShowErrors(0);
tidy.setQuiet(true);
tidy.setMakeClean(true);

URL url = new URL(url_string);
Document doc = tidy.parseDOM(url.openStream(), null);

String xpath_string = "//table[@id='links']//a";
XPath xpath = XPathFactory.newInstance().newXPath();
XPathExpression expr = xpath.compile(xpath_string);
NodeList n = (NodeList)expr.evaluate(doc, XPathConstants.NODESET);

这是我得到的错误:

javax.xml.transform.TransformerException: -1
at com.sun.org.apache.xpath.internal.XPath.execute(Unknown Source)
at com.sun.org.apache.xpath.internal.XPath.execute(Unknown Source)
at com.sun.org.apache.xpath.internal.jaxp.XPathExpressionImpl.eval(Unknown Source)
at com.sun.org.apache.xpath.internal.jaxp.XPathExpressionImpl.eval(Unknown Source)
at com.sun.org.apache.xpath.internal.jaxp.XPathExpressionImpl.evaluate(Unknown Source)
at IndoorClimbing.main(IndoorClimbing.java:55)
Caused by: java.lang.ArrayIndexOutOfBoundsException: -1
at com.sun.org.apache.xml.internal.dtm.ref.ExpandedNameTable.getType(Unknown Source)
at com.sun.org.apache.xml.internal.dtm.ref.DTMDefaultBase.indexNode(Unknown Source)
at com.sun.org.apache.xml.internal.dtm.ref.dom2dtm.DOM2DTM.addNode(Unknown Source)
at com.sun.org.apache.xml.internal.dtm.ref.dom2dtm.DOM2DTM.nextNode(Unknown Source)
at com.sun.org.apache.xml.internal.dtm.ref.DTMDefaultBase._firstch(Unknown Source)
at com.sun.org.apache.xml.internal.dtm.ref.DTMDefaultBase.getFirstChild(Unknown Source)
at com.sun.org.apache.xml.internal.dtm.ref.DTMDefaultBaseTraversers$ChildTraverser.first(Unknown Source)
at com.sun.org.apache.xpath.internal.axes.AxesWalker.getNextNode(Unknown Source)
at com.sun.org.apache.xpath.internal.axes.AxesWalker.nextNode(Unknown Source)
at com.sun.org.apache.xpath.internal.axes.WalkingIterator.nextNode(Unknown Source)
at com.sun.org.apache.xpath.internal.axes.NodeSequence.nextNode(Unknown Source)
at com.sun.org.apache.xpath.internal.axes.NodeSequence.runTo(Unknown Source)
at com.sun.org.apache.xpath.internal.axes.NodeSequence.setRoot(Unknown Source)
at com.sun.org.apache.xpath.internal.axes.LocPathIterator.execute(Unknown Source)
... 6 more
---------
java.lang.ArrayIndexOutOfBoundsException: -1
at com.sun.org.apache.xml.internal.dtm.ref.ExpandedNameTable.getType(Unknown Source)
at com.sun.org.apache.xml.internal.dtm.ref.DTMDefaultBase.indexNode(Unknown Source)
at com.sun.org.apache.xml.internal.dtm.ref.dom2dtm.DOM2DTM.addNode(Unknown Source)
at com.sun.org.apache.xml.internal.dtm.ref.dom2dtm.DOM2DTM.nextNode(Unknown Source)
at com.sun.org.apache.xml.internal.dtm.ref.DTMDefaultBase._firstch(Unknown Source)
at com.sun.org.apache.xml.internal.dtm.ref.DTMDefaultBase.getFirstChild(Unknown Source)
at com.sun.org.apache.xml.internal.dtm.ref.DTMDefaultBaseTraversers$ChildTraverser.first(Unknown Source)
at com.sun.org.apache.xpath.internal.axes.AxesWalker.getNextNode(Unknown Source)
at com.sun.org.apache.xpath.internal.axes.AxesWalker.nextNode(Unknown Source)
at com.sun.org.apache.xpath.internal.axes.WalkingIterator.nextNode(Unknown Source)
at com.sun.org.apache.xpath.internal.axes.NodeSequence.nextNode(Unknown Source)
at com.sun.org.apache.xpath.internal.axes.NodeSequence.runTo(Unknown Source)
at com.sun.org.apache.xpath.internal.axes.NodeSequence.setRoot(Unknown Source)
at com.sun.org.apache.xpath.internal.axes.LocPathIterator.execute(Unknown Source)
at com.sun.org.apache.xpath.internal.XPath.execute(Unknown Source)
at com.sun.org.apache.xpath.internal.XPath.execute(Unknown Source)
at com.sun.org.apache.xpath.internal.jaxp.XPathExpressionImpl.eval(Unknown Source)
at com.sun.org.apache.xpath.internal.jaxp.XPathExpressionImpl.eval(Unknown Source)
at com.sun.org.apache.xpath.internal.jaxp.XPathExpressionImpl.evaluate(Unknown Source)
at IndoorClimbing.main(IndoorClimbing.java:55)
--------------- linked to ------------------
javax.xml.xpath.XPathExpressionException
at com.sun.org.apache.xpath.internal.jaxp.XPathExpressionImpl.evaluate(Unknown Source)
at IndoorClimbing.main(IndoorClimbing.java:55)
Caused by: javax.xml.transform.TransformerException: -1
at com.sun.org.apache.xpath.internal.XPath.execute(Unknown Source)
at com.sun.org.apache.xpath.internal.XPath.execute(Unknown Source)
at com.sun.org.apache.xpath.internal.jaxp.XPathExpressionImpl.eval(Unknown Source)
at com.sun.org.apache.xpath.internal.jaxp.XPathExpressionImpl.eval(Unknown Source)
... 2 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: -1
at com.sun.org.apache.xml.internal.dtm.ref.ExpandedNameTable.getType(Unknown Source)
at com.sun.org.apache.xml.internal.dtm.ref.DTMDefaultBase.indexNode(Unknown Source)
at com.sun.org.apache.xml.internal.dtm.ref.dom2dtm.DOM2DTM.addNode(Unknown Source)
at com.sun.org.apache.xml.internal.dtm.ref.dom2dtm.DOM2DTM.nextNode(Unknown Source)
at com.sun.org.apache.xml.internal.dtm.ref.DTMDefaultBase._firstch(Unknown Source)
at com.sun.org.apache.xml.internal.dtm.ref.DTMDefaultBase.getFirstChild(Unknown Source)
at com.sun.org.apache.xml.internal.dtm.ref.DTMDefaultBaseTraversers$ChildTraverser.first(Unknown Source)
at com.sun.org.apache.xpath.internal.axes.AxesWalker.getNextNode(Unknown Source)
at com.sun.org.apache.xpath.internal.axes.AxesWalker.nextNode(Unknown Source)
at com.sun.org.apache.xpath.internal.axes.WalkingIterator.nextNode(Unknown Source)
at com.sun.org.apache.xpath.internal.axes.NodeSequence.nextNode(Unknown Source)
at com.sun.org.apache.xpath.internal.axes.NodeSequence.runTo(Unknown Source)
at com.sun.org.apache.xpath.internal.axes.NodeSequence.setRoot(Unknown Source)
at com.sun.org.apache.xpath.internal.axes.LocPathIterator.execute(Unknown Source)
... 6 more

尝试生成节点列表时,错误发生在代码的最后一行。新版本的 JTidy 有没有人遇到过这样的问题?

最佳答案

遇到了类似的问题。找到了一个相当愚蠢的解决方法(重新解析 jtidy 输出),它表明 jTidy 存在问题。

document = tidy.parseDOM(rstream, null); 

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
Source xmlSource = new DOMSource(document);
Result outputTarget = new StreamResult(outputStream);
TransformerFactory.newInstance().newTransformer().transform(xmlSource, outputTarget);
InputStream is = new ByteArrayInputStream(outputStream.toByteArray());

Document doc = db.parse(is);

我花了好几个小时;希望这会有所帮助。

关于java - JTidy 升级破坏了文档 xpaths,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/1530154/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com