gpt4 book ai didi

java - 为什么 Apache HttpClient 4.2 无法检索此页面?

转载 作者:行者123 更新时间:2023-12-02 07:43:16 25 4
gpt4 key购买 nike

我正在尝试使用 Apache HttpClient 检索此页面:http://quick-dish.tablespoon.com/

不幸的是,当我尝试这样做时,它只返回以下内容(由 JSoup 返回,所以可能它实际上只是返回 HTTP... 字符串本身):

<html>
<head></head>
<body>
HTTP/1.1 200 OK [Server: nginx/1.0.11, Content-Type: text/html;charset=UTF-8, Last-Modified: Mon, 02 Jul 2012 15:30:40 GMT, Vary: Accept-Encoding, Cookie,Accept-Encoding, X-Powered-By: PHP/5.3.6, X-Pingback: http://quick-dish.tablespoon.com/xmlrpc.php, X-Powered-By: ASP.NET, Content-Encoding: gzip, X-Blz: lb1.blaze.io, Date: Mon, 02 Jul 2012 16:06:21 GMT, Content-Length: 11723, Connection: keep-alive]
</body>
</html>

这是我的代码(请注意,我正在模拟 Google Bot,因为我发现网络服务器往往会以这种方式表现得更好):

URL sourceURL = new URL("http://quick-dish.tablespoon.com/");
HttpClient httpClient = new ContentEncodingHttpClient();
httpClient.getParams().setBooleanParameter("http.protocol.handle-redirects", true);

final HttpGet httpget = new HttpGet(sourceURL.toURI());
httpget.setHeader("User-Agent", "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)");
httpget.setHeader("Accept", "text/html");
httpget.setHeader("Accept-Charset", "utf-8");

final HttpResponse response = httpClient.execute(httpget);
return Jsoup.parse(response.toString());

不用说,该页面在我的网络浏览器中返回正常。有什么想法吗?

最佳答案

您需要获取响应实体,而不是 toString

// Get hold of the response entity
HttpEntity entity = response.getEntity();

然后就可以获取其中的内容了

关于java - 为什么 Apache HttpClient 4.2 无法检索此页面?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/11297387/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com