gpt4 book ai didi

java - Jsoup 从网站获取文本

转载 作者:行者123 更新时间:2023-12-01 10:43:58 31 4
gpt4 key购买 nike

我已经可以在网站中导航并获取我想要的所有链接。但我的主要目标是获得酒店的评论。我正在使用的网站是这个http://www.booking.com/hotel/pt/park-italia-flat.pt-pt.html?label=gen173nr-17CAEoggJCAlhYSDNiBW5vcmVmaLsBiAEBmAEvuAEEyAEE2AEB6AEB-AEL;sid=637e7af0c3009aa9ea132a960e2d2d40 ;dcid= 4;ucfs=1;room1=A,A;srfid=b8260a1c264a3873291a9061733a43536a4d35c2X979#tab-reviews我可以使用 jsoup 毫无问题地到达那里,但现在我不知道如何获取文本。我已经尝试过 getElementsByTag 和 getText 以及其他解决方案。这可以用 jsoup 来完成吗?或者我需要另一个库。我正在尝试这种方式来获取文本。但出现的文字不是我想要的。

        Document doc ;
try {
doc = Jsoup.connect(pair.getValue().toString() + "#tab-reviews").get();
Elements scriptElements = doc.getElementsMatchingText("span");
for (Element link : scriptElements ) {
System.out.printf(" Text: <%s> \n", link.text());
}

} catch (IOException ex) {
Logger.getLogger(GetComentsThread.class.getName()).log(Level.SEVERE, null, ex);
}

为了获取网址,我使用类似的东西。

Pattern pattern = Pattern.compile("src=destinationfinder");
Document doc = Jsoup.connect(url).get();
Elements links = doc.select("a[href]");
for (Element link : links) {
Matcher matcher = pattern.matcher(link.attr("abs:href"));
if (matcher.find()) {
dest = link.attr("abs:href");
break;
}
}

现在我可以获得一些评论,但只有积极的不知道为什么

doc = Jsoup.connect(pair.getValue().toString() + "#tab-reviews").get();
//doc = Jsoup.connect("http://www.booking.com/hotel/pt/pestanaportohotel.pt-pt.html?label=gen173nr-17CAEoggJCAlhYSDNiBW5vcmVmaLsBiAEBmAEvuAEEyAEE2AEB6AEB-AEL;sid=cff2dddd95e71c0768847a554584c888;dcid=4;dist=0;group_adults=2;room1=A%2CA;sb_price_type=total;srfid=798bd6b01ea1dba53ee6b6b945dda1f623859730X2;type=total;ucfs=1&#tab-reviews").get();
String teste="p.trackit";


Elements scriptElements = doc.select(teste);
for (Element link : scriptElements) {

//System.out.printf(" Text: <%s> ...%s\n", link.text(),link.attr("class=\"review_pos\""));
System.out.printf(" Text: <> ...%s\n",link.text());

}

最佳答案

使用 AJAX 请求加载评论 another url .

在那里您可以获得所需的所有信息。

回应:

<li class="
review_item
clearfix
">
<p class="review_item_date">
16 de Setembro de 2015
</p>
<div class="review_item_reviewer">
<h4>
Beatriz
</h4>
<span class="reviewer_country">
<span class="reviewer_country_flag sflag slang-br">
</span>
Brasil
</span>
</div>
<!-- .review_item_reviewer -->
<div class="review_item_review">
<div class="
review_item_review_container
lang_ltr
seo_reviews_item
">
<div class="review_item_review_header">
<div class="
review_item_header_score_container
">
<div class="review_item_review_score jq_tooltip high_score_tooltip" title="
Excepcional
">
9,6
</div>
</div>
<div class="review_item_header_content_container">
<div class="review_item_header_content seo_review_title">
Excepcional
</div>
</div>
</div>
<ul class="review_item_info_tags">
<li class="review_info_tag"><span class="bullet">&bull;</span> Viagem de lazer</li>
<li class="review_info_tag"><span class="bullet">&bull;</span> Família</li>
<li class="review_info_tag"><span class="bullet">&bull;</span> Apartamento com Varanda</li>
<li class="review_info_tag"><span class="bullet">&bull;</span> Ficou 5 noites</li>
<li class="review_info_tag"><span class="bullet">&bull;</span> Submetido através de dispositivo móvel</li>
</ul>
<div class="review_item_review_content">
<p class="review_pos"><i class="review_item_icon">&#45575;</i>Conforto, perto do centro, perto de um lindo mercado, bem decorado, com todo material necessário para fazer as refeições, Wi-Fi excelente</p>
</div>
</div>
</div>
</li>

关于java - Jsoup 从网站获取文本,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34302355/

31 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com