gpt4 book ai didi

python - 从亚马逊获取所有评论? python 3

转载 作者:太空宇宙 更新时间:2023-11-04 04:47:15 26 4
gpt4 key购买 nike

我正在尝试从 python 中读取产品的所有评论。我有一个脚本,但它不起作用。

parser = html.fromstring(page_response)
XPATH_AGGREGATE = '//span[@id="acrCustomerReviewText"]'
XPATH_REVIEW_SECTION_1 = '//div[@data-hook="reviews-content"]'
XPATH_REVIEW_SECTION_2 = '//div[@data-hook="review"]'

XPATH_AGGREGATE_RATING = '//table[@id="histogramTable"]//tr'
XPATH_PRODUCT_NAME = '//h1//span[@id="productTitle"]//text()'
XPATH_PRODUCT_PRICE = '//span[@id="priceblock_ourprice"]/text()'

raw_product_price = parser.xpath(XPATH_PRODUCT_PRICE)
product_price = ''.join(raw_product_price).replace(',','')

raw_product_name = parser.xpath(XPATH_PRODUCT_NAME)
product_name = ''.join(raw_product_name).strip()
total_ratings = parser.xpath(XPATH_AGGREGATE_RATING)
reviews = parser.xpath(XPATH_REVIEW_SECTION_1)
if not reviews:
reviews = parser.xpath(XPATH_REVIEW_SECTION_2)

页面是https://www.amazon.com/productreviews/ '+asin+"/,其中 asin 是一个 ID(例如,B0718Y23CQ)。我在评论中一无所获。感谢您的帮助!

最佳答案

好吧,老实说,我不知道您使用的某些路径在哪里,因为我找不到它们。我已经重写了您的代码以尝试提供帮助:

from lxml import html 
import requests
import json
asin = 'B0718Y23CQ'
page_response = requests.get('https://www.amazon.com/product-reviews/'+ asin)
parser = html.fromstring(page_response.content)
reviews_html = parser.xpath('//div[@class="a-section review"]')
reviews_arr = []
for review in reviews_html:
review_dic = {}
review_dic['title'] = review.xpath('.//a[@data-hook="review-title"]/text()')
review_dic['rating'] = review.xpath('.//a[@class="a-link-normal"]/@title')
review_dic['author'] = review.xpath('.//a[@data-hook="review-author"]/text()')
review_dic['date'] = review.xpath('.//span[@data-hook="review-date"]/text()')
review_dic['purchase'] = review.xpath('.//span[@data-hook="avp-badge"]/text()')
review_dic['review_text'] = review.xpath('.//span[@data-hook="review-body"]/text()')
review_dic['helpful_votes'] = review.xpath('.//span[@data-hook="helpful-vote-statement"]/text()')
reviews_arr.append(review_dic)
print(json.dumps(reviews_arr, indent = 4))

输出方案为:

{
"title": [
"I find it very useful, I use for anything I need"
],
"rating": [
"5.0 out of 5 stars"
],
"author": [
"Nicoletta Delon"
],
"date": [
"on January 2, 2018"
],
"purchase": [
"Verified Purchase"
],
"review_text": [
"I like this a lot. I use it a lot. It's a medium to small size but it holds a lot."
],
"helpful_votes": [
"\n One person found this helpful.\n "
]
}

现在您必须清理结果,将它们从列表中删除,防止元素为空,我想您会得到所需的。要获得所有评论,您必须迭代页面,将 ?pageNumber=1 添加到链接,并迭代数字。您可以使用代理来防止 ip 被阻止,以防您要发出很多请求。

关于python - 从亚马逊获取所有评论? python 3,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49247602/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com