gpt4 book ai didi

java - Jsoup 从 div 的子级中抓取文本

转载 作者:行者123 更新时间:2023-11-30 02:47:11 24 4
gpt4 key购买 nike

我正在尝试提取链接上的产品评论 - Moto X使用 JSoup 但它抛出 NullPointerException。另外,我想提取点击评论的“阅读更多”链接后显示的文本。

import java.io.*;
import org.jsoup.*;
import org.jsoup.nodes.*;
import org.jsoup.select.*;

public class JSoupEx
{
public static void main(String[] args) throws IOException
{
Document doc = Jsoup.connect("https://www.flipkart.com/moto-x-play-with-turbo-charger-white-16-gb/product-reviews/itmefzwvdejejvth?pid=MOBEFM5HAFRNSJJA").get();
Element ele = doc.select("div[class=qwjRop] > div").first();
System.out.println(ele.text());
}
}

有什么解决办法吗?

最佳答案

正如 Gherkin 所建议的,使用开发人员工具中的网络选项卡,我们会看到一个接收评论(JSON 格式)作为响应的请求:

https://www.flipkart.com/api/3/product/reviews?productId=MOBEFM5HAFRNSJJA&count=15&ratings=ALL&reviewerType=ALL&sortOrder=MOST_HELPFUL&start=0

使用像 JSON.simple 这样的 JSON 解析器我们可以提取评论作者、有用性和文本等信息。

示例代码

String userAgent = "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.143 Safari/537.36";
String reviewApiCall = "https://www.flipkart.com/api/3/product/reviews?productId=MOBEFM5HAFRNSJJA&count=15&ratings=ALL&reviewerType=ALL&sortOrder=MOST_HELPFUL&start=";
String xUserAgent = userAgent + " FKUA/website/41/website/Desktop";
String referer = "https://www.flipkart.com/moto-x-play-with-turbo-charger-white-16-gb/product-reviews/itmefzwvdejejvth?pid=MOBEFM5HAFRNSJJA";
String host = "www.flipkart.com";
int numberOfPages = 2; // first two pages of results will be fetched

try {
// loop for multiple review pages
for (int i = 0; i < numberOfPages; i++) {
// query reviews
Response response = Jsoup.connect(reviewApiCall+(i*15)).userAgent(userAgent).referrer(referer).timeout(5000)
.header("x-user-agent", xUserAgent).header("host", host).ignoreContentType(true).execute();

System.out.println("Response in JSON format:\n\t" + response.body() + "\n");

// parse json response
JSONObject jsonObject = (JSONObject) new JSONParser().parse(response.body().toString());
jsonObject = (JSONObject) jsonObject.get("RESPONSE");
JSONArray jsonArray = (JSONArray) jsonObject.get("data");

for (Object object : jsonArray) {
jsonObject = (JSONObject) object;
jsonObject = (JSONObject) jsonObject.get("value");
System.out.println("Author: " + jsonObject.get("author") + "\thelpful: "
+ jsonObject.get("helpfulCount") + "\n\t"
+ jsonObject.get("text").toString().replace("\n", "\n\t") + "\n");
}
}
} catch (Exception e) {
e.printStackTrace();
}

输出

Response in JSON format:
{"CACHE_INVALIDATION_TTL":"132568825671","REQUEST":null,"REQUEST-ID": [...] }

Author: Flipkart Customer helpful: 140
A great phone at an affordable price with
-an outstanding camera
-great battery life
-an excellent display
-premium looks
the flipkart delivery was also fast and perfect.

Author: Vaibhav Yadav helpful: 518
I m writing this review after using 2 months..
First of all ..I must say this is one of the best product ..camera quality is best in natural lights or daytime..but in low light and in the night..camera quality is not so good but it's ok..
It has good battery backup ..last one day on 3g usage ..while using 4g ..it lasts for about 10-12 hour..
Turbo charges is good..although ..my charger is not working..
Only problem in this phone is ..while charging..this phone heats a lot..this may b becoz of turbo charger..if u r using other charger than it does not heat..

Author: KAPIL CHOPRA helpful: 9
[...]

注意:输出被截断 ([...])

关于java - Jsoup 从 div 的子级中抓取文本,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39850547/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com