gpt4 book ai didi

java - HttpURLConnection 与 https InputStream 乱码

转载 作者:行者123 更新时间:2023-12-02 03:09:02 24 4
gpt4 key购买 nike

我使用HttpURLConnection来爬虫https://translate.google.com/ .

        InetSocketAddress addr = new InetSocketAddress("127.0.0.1", 1082);
Proxy proxy = new Proxy(Proxy.Type.HTTP, addr);
url = new URL("https://translate.google.com/");
HttpURLConnection conn = (HttpURLConnection) url.openConnection(proxy);
conn.setRequestProperty("Accept-Encoding", "gzip, deflate, sdch");
conn.setRequestProperty("Connection", "keep-alive");
conn.setRequestProperty("User-Agent",
"Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.76 Mobile Safari/537.36");
conn.setRequestProperty("Accept", "*/*");

Map<String, List<String>> reqHeaders = conn.getHeaderFields();
List<String> reqTypes = reqHeaders.get("Content-Type");
for (String ss : reqTypes) {
System.out.println(ss);
}

InputStream in = conn.getInputStream();
String s = IOUtils.toString(in, "UTF-8");
System.out.println(s.substring(0, 100));

Map<String, List<String>> resHeader = conn.getHeaderFields();
List<String> resTypes = resHeader.get("Content-Type");
for (String ss : resTypes) {
System.out.println(ss);
}

控制台是

enter image description here

但是当我将 url 更改为 http://translate.google.com/ 时。效果很好。

我爬虫时知道实际上HttpURLConnection是HttpsURLConnection https://translate.google.com/ 。我尝试使用HttpsURLConnection,但仍然乱码。

有什么建议吗?

最佳答案

conn.setRequestProperty("Accept-Encoding", "gzip, deflate, sdch");

响应被压缩,因为上面的行告诉服务器客户端能够理解 Accept-Encoding 中指定的编码。

尝试注释此行或处理这种情况。

HTTPS 有一个更具体的实现,即 HttpsURLConnection,如果您对 https 特定功能感兴趣,例如:

import javax.net.ssl.HttpsURLConnection;

....

URL url = new URL("https://www.google.com/");
HttpsURLConnection conn = (HttpsURLConnection) url.openConnection();

关于java - HttpURLConnection 与 https InputStream 乱码,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41340287/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com