gpt4 book ai didi

java - 如何使用 HttpClient 从网页获取 "Title"

转载 作者:行者123 更新时间:2023-12-01 18:18:58 25 4
gpt4 key购买 nike

我正在尝试使用 Apache HttpClient 4 从网页获取“标题”。

编辑:我的第一种方法是尝试从 header 获取它(使用 HttpHead)。如果这是不可能的,我如何从响应正文中获取它,如 @Todd 所说?

编辑2:

<head>
[...]
<title>This is what I need to get!</title>
[...]
</head>

最佳答案

谢谢大家的评论。使用 jsoup 后,解决方案非常简单。

Document doc = Jsoup.connect("http://example.com/").get();
String title = doc.title();

考虑到我确实需要使用 HttpClient 进行连接,这就是我所拥有的:

org.jsoup.nodes.Document doc = null;
String title = "";

System.out.println("Getting content... ");

CloseableHttpClient httpclient = HttpClients.createDefault();
HttpHost target = new HttpHost(host);
HttpGet httpget = new HttpGet(path);
CloseableHttpResponse response = httpclient.execute(target, httpget);

System.out.println("Parsing content... ");

try {
String line = null;
StringBuffer tmp = new StringBuffer();
BufferedReader in = new BufferedReader(new InputStreamReader(response.getEntity().getContent()));
while ((line = in.readLine()) != null) {
String decoded = new String(line.getBytes(), "UTF-8");
tmp.append(" ").append(decoded);
}

doc = Jsoup.parse(String.valueOf(tmp));

title = doc.title();
System.out.println("Title=" + title); //<== ^_^

//[...]

} finally {
response.close();
}

System.out.println("Done.");

关于java - 如何使用 HttpClient 从网页获取 "Title",我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/28111976/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com