gpt4 book ai didi

java - 使用 Jsoup 连接网页的问题

转载 作者:行者123 更新时间:2023-12-02 03:14:26 27 4
gpt4 key购买 nike

这是我第一次使用 JSoup,在连接到我想要从中解析信息的 url 时遇到问题。

网址: http://uselectionatlas.org/RESULTS/national.php?f=1&year=2008&off=0&elect=0

我最初尝试这样做,但是我遇到了超时异常

    Document doc = Jsoup.connect("http://uselectionatlas.org/RESULTS/national.php?f=1&year=2008&off=0&elect=0").get();

这是一个异常(exception):

java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:152)
at java.net.SocketInputStream.read(SocketInputStream.java:122)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1324)
at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:468)
at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:575)
at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:548)
at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:235)
at org.jsoup.helper.HttpConnection.get(HttpConnection.java:224)
at ParseData.main(ParseData.java:18)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)

我在网上做了一些研究,发现了一个方法 .timeout(0) ,它将 Jsoup 超时设置为无限。

现在当我尝试这个

            Document doc = Jsoup.connect("http://uselectionatlas.org/RESULTS/national.php?f=1&year=2008&off=0&elect=0").timeout(0).get();

我收到以下异常:

org.jsoup.HttpStatusException: HTTP error fetching URL. Status=403, URL=http://uselectionatlas.org/RESULTS/national.php?f=1&year=2008&off=0&elect=0
at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:598)
at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:548)
at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:235)
at org.jsoup.helper.HttpConnection.get(HttpConnection.java:224)
at ParseData.main(ParseData.java:18)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)

有人可以指出我应该如何将此网址加载到 jsoup 中的正确方向吗?

最佳答案

403错误表示服务器禁止访问。您只需将 UserAgent 属性添加到 HTTP header ,如下所示:

Jsoup.connect("http://uselectionatlas.org/RESULTS/national.php?f=1&year=2008&off=0&elect=0")
.userAgent("Mozilla/5.0")
.timeout(0).get();

关于java - 使用 Jsoup 连接网页的问题,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40516614/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com