gpt4 book ai didi

java - 使用 jsoup 抓取多个页面

转载 作者:行者123 更新时间:2023-12-02 02:25:04 24 4
gpt4 key购买 nike

我正在尝试废弃 GitHub 存储库分页中的链接我已经分别刮掉了它们,但现在我想要的是使用一些循环来优化它。知道我该怎么做吗?这是代码

ComitUrl= "http://github.com/apple/turicreate/commits/master";

Document document2 = Jsoup.connect(ComitUrl ).get();

Element pagination = document2.select("div.pagination a").get(0);
String Url1 = pagination.attr("href");
System.out.println("pagination-link1 = " + Url1);


Document document3 = Jsoup.connect(Url1).get();
Element pagination2 = document3.select("div.pagination a").get(1);
String Url2 = pagination2.attr("href");

System.out.println("pagination-link2 = " + Url2);
Document document4 = Jsoup.connect(Url2).get();

Element check = document4.select("span.disabled").first();

if (check.text().equals("Older")) {
System.out.println("No pagination link more");
}
else { Element pagination3 = document4.select("div.pagination a").get(1);
String Url3 = pagination3.attr("href");
System.out.println("pagination-link3 = " + Url3);

}

最佳答案

尝试如下所示:

public static void main(String[] args) throws IOException{
String url = "http://github.com/apple/turicreate/commits/master";
//get first link
String link = Jsoup.connect(url).get().select("div.pagination a").get(0).attr("href");
//an int just to count up links
int i = 1;
System.out.println("pagination-link_"+ i + "\t" + link);
//parse next page using link
//check if the div on next page has more than one link in it
while(Jsoup.connect(link).get().select("div.pagination a").size() >1){
link = Jsoup.connect(link).get().select("div.pagination a").get(1).attr("href");
System.out.println("pagination-link_"+ (++i) +"\t" + link);
}
}

关于java - 使用 jsoup 抓取多个页面,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47912794/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com