gpt4 book ai didi

python - 评论抓取表单 tripadvisor

转载 作者:太空宇宙 更新时间:2023-11-04 00:42:10 24 4
gpt4 key购买 nike

我是 python3 网络抓取的新手。我想抓取迪拜所有酒店的评论,但问题是我只能抓取我在 url 中描述的酒店评论。谁能告诉我如何在不隐式提供每家酒店的网址的情况下获得所有酒店评论?

import requests
from bs4 import BeautifulSoup


importurl = 'https://www.tripadvisor.com/Hotel_Review-g295424-d302778-Reviews-Roda_Al_Bustan_Dubai_Airport-Dubai_Emirate_of_Dubai.html'
r = requests.get(importurl)
soup = BeautifulSoup(r.content, "lxml")
resultsoup = soup.find_all("p", {"class" : "partial_entry"})
#save the reviews to a test text file locally
for review in resultsoup:
review_list = review.get_text()
print(review_list)
with open('testreview.txt', 'w') as fid:
for review in resultsoup:
review_list = review.get_text()
fid.write(review_list)

最佳答案

你应该找到所有酒店的索引页面,将所有链接放入一个列表中,然后循环 url 列表以获取评论。

import bs4, requests
index_pages = ('http://www.tripadvisor.cn/Hotels-g295424-oa{}-Dubai_Emirate_of_Dubai-Hotels.html#ACCOM_OVERVIEW'.format(i) for i in range(0, 540, 30))
urls = []
with requests.session() as s:
for index in index_pages:
r = s.get(index)
soup = bs4.BeautifulSoup(r.text, 'lxml')
url_list = [i.get('href') for i in soup.select('.property_title')]
urls.append(url_list)

输出:

len(urls): 540

关于python - 评论抓取表单 tripadvisor,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41463942/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com