gpt4 book ai didi

java - HtmlUnit 没有获取页面

转载 作者:搜寻专家 更新时间:2023-11-01 07:49:29 24 4
gpt4 key购买 nike

使用HtmlUnit从互联网上抓取数据,我需要登录以下页面https://accounts.google.com/login .

当使用“getPage()”方法时,我不断得到这个异常,我该如何解决?提前谢谢你

  Exception in thread "main" ======= EXCEPTION START ========
Exception class=[net.sourceforge.htmlunit.corejs.javascript.JavaScriptException]
com.gargoylesoftware.htmlunit.ScriptException: AssertionError: Assertion failed: No element found with className: signin-card (script in https://accounts.google.com/login?hl=es#identifier from (2653, 11) to (2753, 10)#2660)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:894)
at net.sourceforge.htmlunit.corejs.javascript.Context.call(Context.java:628)
at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:513)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:776)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:752)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:740)
at com.gargoylesoftware.htmlunit.html.HtmlPage.executeJavaScriptIfPossible(HtmlPage.java:916)
at com.gargoylesoftware.htmlunit.html.HtmlScript.executeInlineScriptIfNeeded(HtmlScript.java:307)
at com.gargoylesoftware.htmlunit.html.HtmlScript.executeScriptIfNeeded(HtmlScript.java:368)
at com.gargoylesoftware.htmlunit.html.HtmlScript$2.execute(HtmlScript.java:238)
at com.gargoylesoftware.htmlunit.html.HtmlScript.onAllChildrenAddedToPage(HtmlScript.java:257)
at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.endElement(HTMLParser.java:773)
at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source)
at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.endElement(HTMLParser.java:730)
at net.sourceforge.htmlunit.cyberneko.HTMLTagBalancer.callEndElement(HTMLTagBalancer.java:1209)
at net.sourceforge.htmlunit.cyberneko.HTMLTagBalancer.endElement(HTMLTagBalancer.java:1111)
at net.sourceforge.htmlunit.cyberneko.filters.DefaultFilter.endElement(DefaultFilter.java:207)
at net.sourceforge.htmlunit.cyberneko.filters.NamespaceBinder.endElement(NamespaceBinder.java:337)
at net.sourceforge.htmlunit.cyberneko.HTMLScanner$ContentScanner.scanEndElement(HTMLScanner.java:3137)
at net.sourceforge.htmlunit.cyberneko.HTMLScanner$ContentScanner.scan(HTMLScanner.java:2100)
at net.sourceforge.htmlunit.cyberneko.HTMLScanner.scanDocument(HTMLScanner.java:927)
at net.sourceforge.htmlunit.cyberneko.HTMLConfiguration.parse(HTMLConfiguration.java:506)
at net.sourceforge.htmlunit.cyberneko.HTMLConfiguration.parse(HTMLConfiguration.java:459)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.parse(HTMLParser.java:980)
at com.gargoylesoftware.htmlunit.html.HTMLParser.parse(HTMLParser.java:241)
at com.gargoylesoftware.htmlunit.html.HTMLParser.parseHtml(HTMLParser.java:187)
at com.gargoylesoftware.htmlunit.DefaultPageCreator.createHtmlPage(DefaultPageCreator.java:269)
at com.gargoylesoftware.htmlunit.DefaultPageCreator.createPage(DefaultPageCreator.java:157)
at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseInto(WebClient.java:512)
at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:386)
at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:304)
at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:451)
at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:436)
at prog.htmlUnit.Scrapeo.iniciaSesion(Scrapeo.java:74)
at prog.htmlUnit.ProgramaPruebas.main(ProgramaPruebas.java:24)
Caused by: net.sourceforge.htmlunit.corejs.javascript.JavaScriptException: [object Object] (script in https://accounts.google.com/login?hl=es#identifier from (2653, 11) to (2753, 10)#2660)
at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpretLoop(Interpreter.java:1006)
at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpret(Interpreter.java:798)
at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.call(InterpretedFunction.java:105)
at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.doTopCall(ContextFactory.java:411)
at com.gargoylesoftware.htmlunit.javascript.HtmlUnitContextFactory.doTopCall(HtmlUnitContextFactory.java:252)
at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.doTopCall(ScriptRuntime.java:3286)
at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.exec(InterpretedFunction.java:115)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$3.doRun(JavaScriptEngine.java:767)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:879)
... 35 more
JavaScriptException value = [object Object]
======= EXCEPTION END ========

抛出异常的部分就这么简单:

public HtmlPage iniciaSesion(String correo, String pass) throws FailingHttpStatusCodeException, MalformedURLException, IOException{
HtmlPage pagActual;
HtmlTextInput cajaTexto;
HtmlButton boton;

pagActual= cliente.getPage("https://accounts.google.com/login?hl=es#identifier");

return pagActual;

主程序只是调用此方法并使用 .asXml() 方法,但在使用之前抛出异常。

最佳答案

您需要在客户端上启用 Javascript。此代码应该有效:

LogFactory.getFactory().setAttribute("org.apache.commons.logging.Log", "org.apache.commons.logging.impl.NoOpLog");

java.util.logging.Logger.getLogger("com.gargoylesoftware").setLevel(Level.OFF);
java.util.logging.Logger.getLogger("org.apache.commons.httpclient").setLevel(Level.OFF);

WebClient client = new WebClient(BrowserVersion.CHROME);
client.getOptions().setJavaScriptEnabled(true);
client.getOptions().setThrowExceptionOnScriptError(false);
client.getOptions().setThrowExceptionOnFailingStatusCode(false);

String url = "https://accounts.google.com/login";
final HtmlPage page = client.getPage(url);

System.out.println(page.asText());

关于java - HtmlUnit 没有获取页面,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37173811/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com