gpt4 book ai didi

java - 403禁止使用Java但不是Web浏览器?

转载 作者:IT老高 更新时间:2023-10-28 20:43:36 24 4
gpt4 key购买 nike

我正在编写一个小型 Java 程序来获取给定 Google 搜索词的结果量。出于某种原因,在 Java 中我得到了 403 Forbidden,但我在 Web 浏览器中得到了正确的结果。代码:

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.URL;


public class DataGetter {

public static void main(String[] args) throws IOException {
getResultAmount("test");
}

private static int getResultAmount(String query) throws IOException {
BufferedReader r = new BufferedReader(new InputStreamReader(new URL("https://www.google.com/search?q=" + query).openConnection()
.getInputStream()));
String line;
String src = "";
while ((line = r.readLine()) != null) {
src += line;
}
System.out.println(src);
return 1;
}

}

还有错误:

Exception in thread "main" java.io.IOException: Server returned HTTP response code: 403 for URL: https://www.google.com/search?q=test
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown Source)
at sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(Unknown Source)
at DataGetter.getResultAmount(DataGetter.java:15)
at DataGetter.main(DataGetter.java:10)

为什么要这样做?

最佳答案

您只需要设置用户代理 header 即可使其工作:

URLConnection connection = new URL("https://www.google.com/search?q=" + query).openConnection();
connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.95 Safari/537.11");
connection.connect();

BufferedReader r = new BufferedReader(new InputStreamReader(connection.getInputStream(), Charset.forName("UTF-8")));

StringBuilder sb = new StringBuilder();
String line;
while ((line = r.readLine()) != null) {
sb.append(line);
}
System.out.println(sb.toString());

从您的异常堆栈跟踪中可以看出,SSL 已为您透明地处理。

虽然获得结果量并不是那么简单,但在此之后,您必须通过获取 cookie 并解析重定向 token 链接来假装自己是浏览器。

String cookie = connection.getHeaderField( "Set-Cookie").split(";")[0];
Pattern pattern = Pattern.compile("content=\\\"0;url=(.*?)\\\"");
Matcher m = pattern.matcher(response);
if( m.find() ) {
String url = m.group(1);
connection = new URL(url).openConnection();
connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.95 Safari/537.11");
connection.setRequestProperty("Cookie", cookie );
connection.connect();
r = new BufferedReader(new InputStreamReader(connection.getInputStream(), Charset.forName("UTF-8")));
sb = new StringBuilder();
while ((line = r.readLine()) != null) {
sb.append(line);
}
response = sb.toString();
pattern = Pattern.compile("<div id=\"resultStats\">About ([0-9,]+) results</div>");
m = pattern.matcher(response);
if( m.find() ) {
long amount = Long.parseLong(m.group(1).replaceAll(",", ""));
return amount;
}

}

正在运行 the full code结果我得到 2930000000L

关于java - 403禁止使用Java但不是Web浏览器?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/13670692/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com