gpt4 book ai didi

java - 如何用java创建网络爬虫?

转载 作者:行者123 更新时间:2023-12-01 19:21:17 25 4
gpt4 key购买 nike

嗨,我想用java创建一个网络爬虫,在其中我想从网页中检索一些数据,如标题、描述,并将数据存储在数据库中

最佳答案

如果您想自己使用包含的 HttpClient在安卓API .

HttpClient 的使用示例(您只需解析出:

public class HttpTest {
public static void main(String... args)
throws ClientProtocolException, IOException {
crawlPage("http://www.google.com/");
}

static Set<String> checked = new HashSet<String>();

private static void crawlPage(String url) throws ClientProtocolException, IOException {

if (checked.contains(url))
return;

checked.add(url);

System.out.println("Crawling: " + url);

HttpClient client = new DefaultHttpClient();
HttpGet request = new HttpGet("http://www.google.com");
HttpResponse response = client.execute(request);

Reader reader = null;
try {
reader = new InputStreamReader(response.getEntity().getContent());

Links links = new Links();
new ParserDelegator().parse(reader, links, true);

for (String link : links.list)
if (link.startsWith("http://"))
crawlPage(link);

} finally {
if (reader != null) {
try {
reader.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
}



static class Links extends HTMLEditorKit.ParserCallback {

List<String> list = new LinkedList<String>();

public void handleStartTag(HTML.Tag t, MutableAttributeSet a, int pos) {
if (t == HTML.Tag.A)
list.add(a.getAttribute(HTML.Attribute.HREF).toString());
}
}
}

关于java - 如何用java创建网络爬虫?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/4131153/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com