gpt4 book ai didi

java - Java中使用API​​ HtmlUnit登录并获取网页

转载 作者:行者123 更新时间:2023-12-01 11:03:00 28 4
gpt4 key购买 nike

我正在尝试获取网页。我正在获取表单、文本输入、复选框和提交按钮,以便我可以通过 java 代码填充这些内容。

首先,我收到这些警告(我认为 ScriptEngine 无法加载某些脚本):

oct 18, 2015 9:45:01 AM com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify
WARNING: Obsolete content type encountered: 'application/x-javascript'.
oct 18, 2015 9:45:01 AM com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify
WARNING: Obsolete content type encountered: 'application/x-javascript'.
oct 18, 2015 9:45:01 AM com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify
WARNING: Obsolete content type encountered: 'application/x-javascript'.

无论如何,在我正确填写java输入并在提交按钮上调用方法click()之后,我没有得到提交后应该加载的页面。那么,我错过了什么?

这是 html 代码:

<form name="form" method="post" action="Login.aspx?test=1" onsubmit="javascript:return doSomething_OnSubmit();" id="form">
//then there are some hidden inputs
//...
<input name="tax_code" type="text" maxlength="10" id="tax_code" style="color:Red;width:120px;" />
<input id="privacy" type="checkbox" name="privacy" onclick="activeConfirmButton()" />
//initially the confirm button is deactivated, after the checkbox is checked the confirm button is active with the onclick event added on it.
<input type="submit" name="Confirm" value="Confirm" onclick="javascript:Form_DoPostBack(new Form_DoPostBack())" id="Confirm" style="color:Blue;font-family:calibri;width:150px;Z-INDEX: 0" />

这是java代码:

try (final WebClient webClient = new WebClient(BrowserVersion.CHROME)) 
{
/* turn off htmlunit warnings */
//java.util.logging.Logger.getLogger("com.gargoylesoftware").setLevel(java.util.logging.Level.OFF);

//webClient.getOptions().setActiveXNative(true);
//webClient.waitForBackgroundJavaScript(50000);

// Get the first page
final HtmlPage page1 = webClient.getPage("http://example.com/examples/Login.aspx?test=1");

final HtmlForm form = page1.getFormByName("form");

final HtmlTextInput taxCodeTextField = form.getInputByName("tax_code");
final HtmlCheckBoxInput checkboxInput = form.getInputByName("privacy");
final HtmlSubmitInput confirmButton = form.getInputByName("Confirm");

//Setting textfield and checkbox
taxCodeTextField.setValueAttribute("TAX_CODE");
checkboxInput.setChecked(true);
//onclick of the checkbox, to activate the confirm button
checkboxInput.click();

// onclick of the confirm button
final HtmlPage page2 = confirmButton.click();

WebResponse response = page2.getWebResponse();
String content = response.getContentAsString();
System.out.println("HTML SOURCE: "+content);

}
catch(Exception e){
}

最佳答案

有一些要点需要考虑。

  • 单击该复选框后,网站会重定向到同一页面,因此必须禁用 HtmlUnit 缓存。
  • 单击复选框只能执行一次,而不是 .setChecked(true).click()
  • 由于点击发生在背景上,由复选框 onclick 处理程序中的 JavaScript setTimeout() 显示,因此必须获得一个新页面。

下面的代码更新页面并返回结果:

    try (final WebClient webClient = new WebClient(BrowserVersion.CHROME)) {

// disable caching
webClient.getCache().setMaxSize(0);

// Get the first page
final HtmlPage page1 = webClient.getPage(url);

final HtmlForm form = page1.getFormByName(formName);

final HtmlTextInput taxCodeTextField = form.getInputByName(taxCodeTextFieldName);
HtmlCheckBoxInput checkboxInput = form.getInputByName(checkboxInputName);

taxCodeTextField.type(taxCode);
checkboxInput.click();

//wait a little
Thread.sleep(2000);

//get the main page
HtmlPage page2 = (HtmlPage) webClient.getTopLevelWindows().get(0).getEnclosedPage();

HtmlSubmitInput confirmButton = page2.getFormByName(formName).getInputByName(confirmButtonName);

final HtmlPage page3 = confirmButton.click();

System.out.println(page3.asText());
}

关于java - Java中使用API​​ HtmlUnit登录并获取网页,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33196528/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com