gpt4 book ai didi

javascript - 在 Java 中执行 Javascript 获取 html 文件

转载 作者:行者123 更新时间:2023-12-02 04:16:13 25 4
gpt4 key购买 nike

我最近发现了如何使用 java 获取 html 代码。

因此我编写了以下方法:

public String htmlToString(String urlString){
//returns the html code of the given website into a string
//if something doesn't work "fail" is returned
try {
//convert String to URL
URL url = new URL(urlString);
//read URL by Scanner
Scanner s = new Scanner(url.openStream());
//put token after token from the html file into a string
String read = "";
while(s.hasNext()){
read += s.next();
}
s.close();
return read;
}
catch(IOException iOEx) {
// there was some connection problem, or the file did not exist on the server,
// or your URL was not in the right format.
// think about what to do now, and put it here.
iOEx.printStackTrace(); // for now, simply output it.
return "fail";
}catch(java.util.NoSuchElementException elEX){
//couldn't find a next token
//similar problemes as described before
elEX.printStackTrace();
return "fail";
}
}

我遇到的问题是,我正在查看包含大量 javascript 的 html 代码,如果执行了 javascript,我就可以使用它们,就像它们是由浏览器打开一样,然后您查看源代码。

有什么办法可以获取这个代码吗?

================================================== =================================

编辑:我现在尝试了我以前从未使用过的 htmlUnit 东西并想出了以下代码:

import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.html.HtmlPage;
import java.io.IOException;


public class Converter2 {

public String htmlToString(String url){
try{
WebClient webClient = new WebClient();
HtmlPage page = webClient.getPage(url);
String pageAsText = page.asText();
webClient.close();
return pageAsText;
}catch(IOException ioEx){
return "fail";
}
}
}

通过尝试运行它,我遇到了很多错误。通过在亚马逊尝试,我得到了这些错误:

    WARNUNG: CSS error: 'http://z-ecx.images-amazon.com/images/G/01/AUIClients/AmazonUI-2215197d18a3d0e321eb1a67a8b9e87ba4b4ab20._V2_.css#AUIClients/AmazonUI.rendering_engine-trident.min' [1:125781] Error in declaration. '*' ist als erstes Zeichen einer Property nicht erlaubt.
Okt 22, 2015 10:23:38 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error
WARNUNG: CSS error: 'http://z-ecx.images-amazon.com/images/G/01/AUIClients/AmazonUI-2215197d18a3d0e321eb1a67a8b9e87ba4b4ab20._V2_.css#AUIClients/AmazonUI.rendering_engine-trident.min' [1:125797] Error in declaration. '*' ist als erstes Zeichen einer Property nicht erlaubt.
Okt 22, 2015 10:23:38 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error
WARNUNG: CSS error: 'http://z-ecx.images-amazon.com/images/G/01/AUIClients/AmazonGatewayAuiAssets-3d5b6f366e05fa5c0b2f38dca7366948b0599a7b._V2_.css#AUIClients/AmazonGatewayAuiAssets.weblab-GW_NOT_INTERESTED_48787-C.min' [1:8806] Fehler in Ausdruck; ':' nach dem identifier "progid" gefunden.
Okt 22, 2015 10:23:38 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error
WARNUNG: CSS error: 'http://z-ecx.images-amazon.com/images/G/01/AUIClients/AmazonGatewayAuiAssets-3d5b6f366e05fa5c0b2f38dca7366948b0599a7b._V2_.css#AUIClients/AmazonGatewayAuiAssets.weblab-GW_NOT_INTERESTED_48787-C.min' [1:8942] Fehler in Ausdruck; ':' nach dem identifier "progid" gefunden.
Okt 22, 2015 10:23:38 PM com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify
WARNUNG: Obsolete content type encountered: 'application/x-javascript'.
Okt 22, 2015 10:23:38 PM com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify
WARNUNG: Obsolete content type encountered: 'application/x-javascript'

通过在名为“csgolounge.com”的网站上尝试,效果会更好:

    Okt 22, 2015 10:32:46 PM com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify
WARNUNG: Obsolete content type encountered: 'text/javascript'.
Okt 22, 2015 10:32:47 PM com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify
WARNUNG: Obsolete content type encountered: 'application/x-javascript'.
Okt 22, 2015 10:32:47 PM com.gargoylesoftware.htmlunit.javascript.StrictErrorReporter runtimeError
SCHWERWIEGEND: runtimeError: message=[The data necessary to complete this operation is not yet available.] sourceName=[http://csgolounge.com/script/jquery.min.js?1423740933] line=[2] lineSource=[null] lineOffset=[0]
Okt 22, 2015 10:32:47 PM com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify
WARNUNG: Obsolete content type encountered: 'application/x-javascript'.
Okt 22, 2015 10:32:48 PM com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify
WARNUNG: Obsolete content type encountered: 'text/javascript'.
Exception in thread "main" ======= EXCEPTION START ========
Exception class=[net.sourceforge.htmlunit.corejs.javascript.JavaScriptException]
com.gargoylesoftware.htmlunit.ScriptException: TagError: adsbygoogle.push() error: No slot size for availableWidth=0 (http://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js#4)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:865)
at net.sourceforge.htmlunit.corejs.javascript.Context.call(Context.java:628)
at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:513)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:747)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:722)
at com.gargoylesoftware.htmlunit.html.HtmlPage.executeJavaScriptIfPossible(HtmlPage.java:945)
at com.gargoylesoftware.htmlunit.html.HtmlScript.executeInlineScriptIfNeeded(HtmlScript.java:351)
at com.gargoylesoftware.htmlunit.html.HtmlScript.executeScriptIfNeeded(HtmlScript.java:411)
at com.gargoylesoftware.htmlunit.html.HtmlScript$3.execute(HtmlScript.java:270)
at com.gargoylesoftware.htmlunit.html.HtmlScript.onAllChildrenAddedToPage(HtmlScript.java:290)
at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.endElement(HTMLParser.java:800)
at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source)
at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.endElement(HTMLParser.java:757)
at org.cyberneko.html.HTMLTagBalancer.callEndElement(HTMLTagBalancer.java:1170)
at org.cyberneko.html.HTMLTagBalancer.endElement(HTMLTagBalancer.java:1072)
at org.cyberneko.html.filters.DefaultFilter.endElement(DefaultFilter.java:206)
at org.cyberneko.html.filters.NamespaceBinder.endElement(NamespaceBinder.java:330)
at org.cyberneko.html.HTMLScanner$ContentScanner.scanEndElement(HTMLScanner.java:3126)
at org.cyberneko.html.HTMLScanner$ContentScanner.scan(HTMLScanner.java:2093)
at org.cyberneko.html.HTMLScanner.scanDocument(HTMLScanner.java:920)
at org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:499)
at org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:452)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.parse(HTMLParser.java:1040)
at com.gargoylesoftware.htmlunit.html.HTMLParser.parse(HTMLParser.java:253)
at com.gargoylesoftware.htmlunit.html.HTMLParser.parseHtml(HTMLParser.java:199)
at com.gargoylesoftware.htmlunit.DefaultPageCreator.createHtmlPage(DefaultPageCreator.java:272)
at com.gargoylesoftware.htmlunit.DefaultPageCreator.createPage(DefaultPageCreator.java:160)
at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseInto(WebClient.java:476)
at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:350)
at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:415)
at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:400)
at Internet.Converter2.htmlToString(Converter2.java:13)
at main.mainMethod.main(mainMethod.java:8)
Caused by: net.sourceforge.htmlunit.corejs.javascript.JavaScriptException: [object Object] (http://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js#4)
at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpretLoop(Interpreter.java:1006)
at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpret(Interpreter.java:798)
at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.call(InterpretedFunction.java:105)
at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.doTopCall(ContextFactory.java:411)
at com.gargoylesoftware.htmlunit.javascript.HtmlUnitContextFactory.doTopCall(HtmlUnitContextFactory.java:310)
at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.doTopCall(ScriptRuntime.java:3286)
at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.exec(InterpretedFunction.java:115)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$3.doRun(JavaScriptEngine.java:738)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:850)
... 33 more
JavaScriptException value = [object Object]
======= EXCEPTION END ========

我真的不明白,它想告诉我什么。我迷路了。有人可以帮助我吗?

最佳答案

您不能仅通过获取 URL 来执行 JavaScript。 JavaScript 由浏览器运行,而不是服务器本身。

关于javascript - 在 Java 中执行 Javascript 获取 html 文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33287799/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com