gpt4 book ai didi

java - 如何读取内容编码为 : gzip 的压缩 HTML 页面

转载 作者:搜寻专家 更新时间:2023-11-01 01:04:08 24 4
gpt4 key购买 nike

我请求发送一个 Content-Encoding: gzip header 的网页,但无法阅读它..

我的代码:

    try {
URLConnection connection = new URL("http://jquery.org").openConnection();
String html = "";
BufferedReader in = null;
connection.setReadTimeout(10000);
in = new BufferedReader(new InputStreamReader(connection.getInputStream()));
String inputLine;
while ((inputLine = in.readLine()) != null){
html+=inputLine+"\n";
}
in.close();
System.out.println(html);
System.exit(0);
} catch (IOException ex) {
Logger.getLogger(Crawler.class.getName()).log(Level.SEVERE, null, ex);
}

输出看起来很乱..(我无法在这里粘贴它,一种符号..)

我认为这是一个压缩内容,如何解析它?

注意:
如果我将 jquery.org 更改为 jquery.com(它不发送该 header ,我的代码运行良好)

最佳答案

实际上,这是 pb2q 的答案,但我将完整代码发布给 future 的读者

try {
URLConnection connection = new URL("http://jquery.org").openConnection();
String html = "";
BufferedReader in = null;
connection.setReadTimeout(10000);
//The changed part
if (connection.getHeaderField("Content-Encoding")!=null && connection.getHeaderField("Content-Encoding").equals("gzip")){
in = new BufferedReader(new InputStreamReader(new GZIPInputStream(connection.getInputStream())));
} else {
in = new BufferedReader(new InputStreamReader(connection.getInputStream()));
}
//End
String inputLine;
while ((inputLine = in.readLine()) != null){
html+=inputLine+"\n";
}
in.close();
System.out.println(html);
System.exit(0);
} catch (IOException ex) {
Logger.getLogger(Crawler.class.getName()).log(Level.SEVERE, null, ex);
}

关于java - 如何读取内容编码为 : gzip 的压缩 HTML 页面,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/11093153/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com