gpt4 book ai didi

java - Jsoup 获取元素内的元素

转载 作者:行者123 更新时间:2023-12-02 10:56:12 25 4
gpt4 key购买 nike

我正在尝试抓取以下页面:https://icobench.com/icos我一直在尝试使用 ico_data 类从元素中提取一些信息。代码如下所示:

<td class="ico_data">
<div class="image_box"><a href="/ico/gcox" class="image" style="background-image:url('/images/icos/icons/gcox.jpg');"></a></div>
<div class="content">
<a class="name" href="/ico/gcox"><object><a href="/premium" title="Premium" class="premium">&nbsp;</a></object>GCOX</a>
<p>GCOX is the world's first blockchain-powered platform that allows the popularity of celebrities to be tokenised and listed.<br><br><b>Restrictions KYC:</b> Yes <span class="line">|</span> <b>Whitelist:</b> Yes <span class="line">|</span> <b>Countries:</b> USA, Singapore</p>
</div>
<div class="shw">
<div class="row"><b>Start:</b> 08 Aug 2018</div>
<div class="row"><b>End:</b> 31 Aug 2018</div>
<div class="row"><b>Rate:</b>
<div class="rate color4">3.9</div>
</div>
</div>
</td>

我想提取名称、描述、开始日期、结束日期。我该怎么办?

这是我到目前为止的代码:

Document document = Jsoup.connect("https://icobench.com/icos").userAgent("Mozilla").get();    
Elements companyElements = document.getElementsByClass("ico_data");
for (Element companyElement : companyElements) {
// do stuff here
}

谢谢

最佳答案

您可以通过使用 contains 过滤标签来过滤掉开始和结束。名称由类“name”组成,描述由内容 div 内的 P 标签组成。

public void extract(){
try {
Connection connection = Jsoup.connect("https://icobench.com/icos");
Document document = connection.get();
Elements companyElements = document.select(".ico_data");
for (Element companyElement : companyElements) {

if(companyElement.select(".content")!=null&&companyElement.select(".content").size()>0){

Element content = companyElement.select(".content").first();
String name = companyElement.select(".content").select(".name").text();
String description = companyElement.select(".content").select("p").text();
String start = companyElement.select("b:contains(Start)").first()
.parent().text().replace(companyElement.select("b:contains(Start)").first().text(),"");
String end = companyElement.select("b:contains(End)").first()
.parent().text().replace(companyElement.select("b:contains(End)").first().text(),"");

}

System.out.println(companyElement);
// do stuff here
}
} catch (IOException e) {
e.printStackTrace();
}

关于java - Jsoup 获取元素内的元素,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51716328/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com