gpt4 book ai didi

java - HtmlUnit HtmlSubmitInput.click() 结果 "Incorrect URL"更正为 "cgi-bin",然后导致 UnknownHostException

转载 作者:太空宇宙 更新时间:2023-11-04 12:43:59 24 4
gpt4 key购买 nike

我正在尝试编写一个应该访问此网站的小机器人 http://lsa.colorado.edu/cgi-bin/LSA-pairwise.html ,在文本区域中输入一些文本,然后按提交按钮获取提交后的结果页面。这是一个语言学项目。但是,当我单击 HtmlSubmitInput 按钮时,返回的 URL 似乎格式错误,因为 In CorrectnessListenerImpl 通知我:

Apr 10, 2016 2:38:35 PM com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify
WARNUNG: Incorrect URL "http:/cgi-bin/LSA-pairwise-x.html" has been corrected

网址应该是

http://lsa.colorado.edu/cgi-bin/LSA-pairwise-x.html

这会导致以下堆栈跟踪(由于长度而缩短):

Exception in thread "main" java.lang.RuntimeException: java.net.UnknownHostException: cgi-bin: unknown error
at com.gargoylesoftware.htmlunit.WebClient.download(WebClient.java:2078)
at com.gargoylesoftware.htmlunit.html.HtmlForm.submit(HtmlForm.java:141)
at com.gargoylesoftware.htmlunit.html.HtmlSubmitInput.doClickStateUpdate(HtmlSubmitInput.java:90)
at com.gargoylesoftware.htmlunit.html.DomElement.click(DomElement.java:795)
at com.gargoylesoftware.htmlunit.html.DomElement.click(DomElement.java:742)
at com.gargoylesoftware.htmlunit.html.DomElement.click(DomElement.java:689)
at LSABot.submitInput(LSABot.java:30)
at Start.main(Start.java:8)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)
[...]

我的猜测是 HtmlUnit 尝试修复 URL,但这只会导致“cgi-bin”,这当然是格式错误的。我一遍又一遍地搜索,但没有找到任何与我的问题相关的内容。

我的 LSABot 类(class):

public class LSABot {
final WebClient webClient;
private HtmlPage mainPg, rsltPg;
private HtmlForm htmlForm;
private HtmlTextArea txtA;
private HtmlSubmitInput submitBt;

public LSABot () throws Exception {
this.webClient = new WebClient(BrowserVersion.CHROME);
this.webClient.getOptions().setJavaScriptEnabled(true);
this.mainPg = this.webClient.getPage("http://lsa.colorado.edu/cgi-bin/LSA-pairwise.html");
this.htmlForm = this.mainPg.getForms().get(0);
this.txtA = this.htmlForm.getTextAreaByName("txt1");
this.submitBt = this.htmlForm.getInputByValue("Submit Texts");
}

public void submitInput(String input) {
this.txtA.setText(input);
try {
this.rsltPg = this.submitBt.click();
this.webClient.waitForBackgroundJavaScript(30*1000);
} catch (IOException ioe) {
ioe.printStackTrace();
}
}

最佳答案

错误来自于表单的html内容。 action 属性应为 http://lsa.colorado.edu/cgi-bin/LSA-pairwise-x.html 而不是 http:/cgi-bin/LSA-pairwise-x.html

试试这个代码,它应该可以工作:

LogFactory.getFactory().setAttribute("org.apache.commons.logging.Log", "org.apache.commons.logging.impl.NoOpLog");

java.util.logging.Logger.getLogger("com.gargoylesoftware").setLevel(Level.OFF);
java.util.logging.Logger.getLogger("org.apache.commons.httpclient").setLevel(Level.OFF);

WebClient client = new WebClient(BrowserVersion.CHROME);
client.getOptions().setJavaScriptEnabled(true);
client.getOptions().setThrowExceptionOnScriptError(false);
client.getOptions().setThrowExceptionOnFailingStatusCode(false);

String url = "http://lsa.colorado.edu/cgi-bin/LSA-pairwise.html";
final HtmlPage page = client.getPage(url);

HtmlForm htmlForm = page.getForms().get(0);
HtmlTextArea txtA = htmlForm.getTextAreaByName("txt1");
txtA.setText("hello");
HtmlSubmitInput submitBt = htmlForm.getInputByValue("Submit Texts");

// change the form action attribute to the correct one
htmlForm.setAttribute("action", "http://lsa.colorado.edu/cgi-bin/LSA-pairwise-x.html");

HtmlPage page2 = submitBt.click();
System.out.println(page2.asText());

关于java - HtmlUnit HtmlSubmitInput.click() 结果 "Incorrect URL"更正为 "cgi-bin",然后导致 UnknownHostException,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/36530196/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com