gpt4 book ai didi

java - Jsoup连接(): bypass google captcha

转载 作者:行者123 更新时间:2023-12-01 06:15:02 31 4
gpt4 key购买 nike

我制作了一个小型应用程序,我必须根据关键字检索 URL。这是代码:

  Elements doc = Jsoup
.connect(request)
.userAgent(
"Mozilla 5.0 (Windows NT 6.1)")
.timeout(5000).get().select("li.g>h3>a");


for (Element link : doc) {

String url = link.absUrl("href");
try {
url = URLDecoder.decode(url.substring(url.indexOf('=') + 1, url.indexOf('&')), "UTF-8");
} catch (UnsupportedEncodingException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}



if(!url.startsWith("http"))
continue; // Ads/news/etc.
else if(url.contains("/pdf/"))
continue;
else if(url.contains("//github.com/"))
continue;


res.add(url);
}

刚刚收到以下错误:

org.jsoup.HttpStatusException: HTTP error fetching URL. Status=503, URL=http://ipv4.google.com/sorry/IndexRedirect?continue=http://www.google.com/search%3Flr%3Dlang_en....
at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:435)
at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:446)
at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:410)
at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:164)
at org.jsoup.helper.HttpConnection.get(HttpConnection.java:153)
at sperimentazioni.Main.getDataFromGoogle(Main.java:327)
at sperimentazioni.Main.getURLs(Main.java:164)
at sperimentazioni.Main.main(Main.java:485)

显然是谷歌验证码,我该如何绕过?

最佳答案

以下逻辑对我有用:

Document doc =
Jsoup.connect(request)
.userAgent("Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)")
.timeout(5000).get();

Elements links = doc.select("a[href]");
for (Element link : links) {

String temp = link.attr("href");
if (temp.startsWith("/url?q="))
result.add(temp);

}

关于java - Jsoup连接(): bypass google captcha,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/26896046/

31 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com