gpt4 book ai didi

java - 如何修复 htmlunit 中无法识别 cyberneko 自关闭 iframe 的问题?

转载 作者:太空宇宙 更新时间:2023-11-04 09:40:07 25 4
gpt4 key购买 nike

我目前正在尝试使用 HTMLunit 制作一个网页抓取程序。但是,当我运行它时,我收到此错误

Exception in thread "main" com.gargoylesoftware.htmlunit.ObjectInstantiationException: unable to create HTML parser
at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.<init>(HTMLParser.java:418)
at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.<init>(HTMLParser.java:342)
at com.gargoylesoftware.htmlunit.html.HTMLParser.parse(HTMLParser.java:203)
at com.gargoylesoftware.htmlunit.html.HTMLParser.parseHtml(HTMLParser.java:179)
at com.gargoylesoftware.htmlunit.DefaultPageCreator.createHtmlPage(DefaultPageCreator.java:221)
at com.gargoylesoftware.htmlunit.DefaultPageCreator.createPage(DefaultPageCreator.java:106)
at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseInto(WebClient.java:433)
at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:311)
at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:373)
at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:358)
at ReviewScrapping.getCOntentData(ReviewScrapping.java:28)
at ReviewScrapping.main(ReviewScrapping.java:34)
Caused by: org.xml.sax.SAXNotRecognizedException: Feature 'http://cyberneko.org/html/features/scanner/allow-selfclosing-iframe' is not recognized.
at org.apache.xerces.parsers.AbstractSAXParser.setFeature(Unknown Source)
at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.<init>(HTMLParser.java:411)
... 11 more

我已经尝试遵循此解决方案 When using HtmlUnit, how can I configure the underlying NekoHtml parser?

但是,我仍然遇到同样的问题。

这是我当前的程序,我将网站连接到我的程序

 public static HtmlPage getCOntentData(String url) throws IOException{
BrowserVersionFeatures[] bvf = new BrowserVersionFeatures[1];
bvf[0] = BrowserVersionFeatures.HTMLIFRAME_IGNORE_SELFCLOSING;
BrowserVersion bv = new BrowserVersion(
BrowserVersion.NETSCAPE, "5.0 (Windows; en-US)",
"Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.8) Gecko/20100722 Firefox/3.6.8",
(float) 3.6, bvf);

WebClient webClient = new WebClient(bv);
webClient.setJavaScriptEnabled(true);

return webClient.getPage(url);
}

在我的主目录

 HtmlPage site = getCOntentData("https://www.tokopedia.com/p/handphone-tablet");
List<?> date = site.getByXPath("//div[@class='V4CqgZIv']");
System.out.println(date.get(0));

这就是我现在所拥有的,我目前陷入了如何修复它的困境。

我现在想要的是让这个错误消失

最佳答案

Caused by: org.xml.sax.SAXNotRecognizedException: Feature 'http://cyberneko.org/html/features/scanner/allow-selfclosing-iframe' is not recognized.

看来您的 neko 解析器版本错误。请使用最新版本(目前为2.35.0)。如果您使用 maven,请确保应用程序的其他部分没有覆盖 neko-htmlunit 依赖项(也在版本 2.35.0 中)。如果您不使用 maven,请下载文件 htmlunit-2.35.0-bin.zip 并确保您的类路径中只有所有依赖项的正确版本。

关于java - 如何修复 htmlunit 中无法识别 cyberneko 自关闭 iframe 的问题?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56088522/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com