gpt4 book ai didi

java - 使用 HtmlUnit 进行抓取时出现 OutOfMemoryError

转载 作者:塔克拉玛干 更新时间:2023-11-03 05:14:01 25 4
gpt4 key购买 nike

我正在使用 HtmlUnit 登录网站,然后从表中下载数据

当我运行我的代码时,导致 java.lang.OutOfMemoryError 并且无法进一步运行。

以下是我的代码:

WebClient webClient = new WebClient(BrowserVersion.INTERNET_EXPLORER_6);
webClient.getOptions().setJavaScriptEnabled(true);
webClient.getOptions().setCssEnabled(false);
webClient.getOptions().setRedirectEnabled(true);
webClient.getCookieManager().setCookiesEnabled(true);
webClient.getOptions().setPrintContentOnFailingStatusCode(false);
webClient.setAjaxController(new NicelyResynchronizingAjaxController());
webClient.getOptions().setTimeout(50000);
webClient.getOptions().setUseInsecureSSL(true);
webClient.getOptions().setPopupBlockerEnabled(true);

HtmlPage htmlPage=webClient.getPage(url);
Thread.sleep(200);
//~~~~~~~Log-In
HtmlTextInput uname=(HtmlTextInput)htmlPage.getFirstByXPath("//*[@id=\"username\"]");
uname.setValueAttribute("xxx");
HtmlPasswordInput upass=(HtmlPasswordInput)htmlPage.getFirstByXPath("//*[@id=\"password\"]");
upass.setValueAttribute("xxx");
HtmlSubmitInput submit=(HtmlSubmitInput)htmlPage.getFirstByXPath("//*[@id=\"login-button\"]/input");
htmlPage=(HtmlPage) submit.click();
Thread.sleep(200);
webClient.waitForBackgroundJavaScript(10000);
for (int i = 0; i < 250; i++) {
if (!htmlPage.asText().contains("Loading...")) {
break;
}
synchronized (htmlPage) {
htmlPage.wait(500);
}
}

System.out.println(htmlPage.asText());

下面是stackTrace

java.lang.OutOfMemoryError: Java heap space
at net.sourceforge.htmlunit.corejs.javascript.Node.newString(Node.java:155)
at net.sourceforge.htmlunit.corejs.javascript.Node.newString(Node.java:151)
at net.sourceforge.htmlunit.corejs.javascript.IRFactory.createPropertyGet(IRFactory.java:1990)
at net.sourceforge.htmlunit.corejs.javascript.IRFactory.transformPropertyGet(IRFactory.java:968)
at net.sourceforge.htmlunit.corejs.javascript.IRFactory.transform(IRFactory.java:106)
at net.sourceforge.htmlunit.corejs.javascript.IRFactory.transformPropertyGet(IRFactory.java:964)
at net.sourceforge.htmlunit.corejs.javascript.IRFactory.transform(IRFactory.java:106)
at net.sourceforge.htmlunit.corejs.javascript.IRFactory.transformPropertyGet(IRFactory.java:964)
at net.sourceforge.htmlunit.corejs.javascript.IRFactory.transform(IRFactory.java:106)
at net.sourceforge.htmlunit.corejs.javascript.IRFactory.transformFunctionCall(IRFactory.java:595)
at net.sourceforge.htmlunit.corejs.javascript.IRFactory.transform(IRFactory.java:86)
at net.sourceforge.htmlunit.corejs.javascript.IRFactory.transformInfix(IRFactory.java:775)
at net.sourceforge.htmlunit.corejs.javascript.IRFactory.transform(IRFactory.java:161)
at net.sourceforge.htmlunit.corejs.javascript.IRFactory.transformAssignment(IRFactory.java:368)
at net.sourceforge.htmlunit.corejs.javascript.IRFactory.transform(IRFactory.java:152)
at net.sourceforge.htmlunit.corejs.javascript.IRFactory.transformExprStmt(IRFactory.java:488)
at net.sourceforge.htmlunit.corejs.javascript.IRFactory.transform(IRFactory.java:149)
at net.sourceforge.htmlunit.corejs.javascript.IRFactory.transformBlock(IRFactory.java:406)
at net.sourceforge.htmlunit.corejs.javascript.IRFactory.transform(IRFactory.java:82)
at net.sourceforge.htmlunit.corejs.javascript.IRFactory.transformIf(IRFactory.java:762)
at net.sourceforge.htmlunit.corejs.javascript.IRFactory.transform(IRFactory.java:110)
at net.sourceforge.htmlunit.corejs.javascript.IRFactory.transformBlock(IRFactory.java:406)
at net.sourceforge.htmlunit.corejs.javascript.IRFactory.transform(IRFactory.java:82)
at net.sourceforge.htmlunit.corejs.javascript.IRFactory.transformIf(IRFactory.java:762)
at net.sourceforge.htmlunit.corejs.javascript.IRFactory.transform(IRFactory.java:110)
at net.sourceforge.htmlunit.corejs.javascript.IRFactory.transformBlock(IRFactory.java:406)
at net.sourceforge.htmlunit.corejs.javascript.IRFactory.transform(IRFactory.java:82)
at net.sourceforge.htmlunit.corejs.javascript.IRFactory.transformIf(IRFactory.java:768)
at net.sourceforge.htmlunit.corejs.javascript.IRFactory.transform(IRFactory.java:110)
at net.sourceforge.htmlunit.corejs.javascript.IRFactory.transformBlock(IRFactory.java:406)
at net.sourceforge.htmlunit.corejs.javascript.IRFactory.transform(IRFactory.java:82)
at net.sourceforge.htmlunit.corejs.javascript.IRFactory.transformFunction(IRFactory.java:560)

我在 catlina.sh 文件中添加了以下行来分配堆内存,但我仍然遇到同样的错误(我的 RAM 大小是 2GB)。

if [ -z "$LOGGING_MANAGER" ]; then
JAVA_OPTS="$JAVA_OPTS -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager"
else
JAVA_OPTS="$JAVA_OPTS $LOGGING_MANAGER"
fi

# Uncomment the following line to make the umask available when using the
# org.apache.catalina.security.SecurityListener
JAVA_OPTS="$JAVA_OPTS -Dorg.apache.catalina.security.SecurityListener.UMASK=`umask`"
JAVA_OPTS="$JAVA_OPTS -Xms512m -Xmx2048m -XX:MaxPermSize=512m"
JAVA_OPTS="-server -XX:+UseConcMarkSweepGC"

最佳答案

你在代码的最后一行包含这个$JAVA_OPTS,希望你的代码能正常工作

JAVA_OPTS="$JAVA_OPTS -server -XX:+UseConcMarkSweepGC"

关于java - 使用 HtmlUnit 进行抓取时出现 OutOfMemoryError,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/15830227/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com