gpt4 book ai didi

python - 打印 Python Beautiful Soup 的 div 类中内容的值

转载 作者:行者123 更新时间:2023-12-01 08:10:23 26 4
gpt4 key购买 nike

我正在尝试为一个大学项目抓取一个网站。网站是:https://www.influenster.com/reviews/samsung-galaxy-s9

我想获取每个用户对产品的评分,评分不是文本格式,如下所示。我希望从内容中提取值 4。

我尝试了多种方法。但每次都会出错,无法检索到正确的数据:

from urllib.request import Request, urlopen
from bs4 import BeautifulSoup

stars_comb=[]

req = Request('https://www.influenster.com/reviews/samsung-galaxy-s9', headers={'User-Agent': 'Mozilla/5.0'})
webpage = urlopen(req).read()
soup = BeautifulSoup(webpage, 'html.parser')

ratings = soup.find_all('div', class_='avg-stars')
print(ratings)

请大家帮助我,我是编程和Python新手。

最佳答案

您需要浏览所有 10 页评论,并忽略每页底部也使用 avg-stars 类的其他 10 个产品评论,尝试像这样首先隔离在搜索 avg-stars 类别之前,仅查看三星 Galaxy S9 手机的评论:

from bs4 import BeautifulSoup
import requests

def main():
all_review_stars = []
base_url = 'https://www.influenster.com/reviews/samsung-galaxy-s9?review_page='
last_page_num = 10
for page_num in range(1, last_page_num + 1):
page_link = base_url + str(page_num)
page_response = requests.get(page_link, headers={'User-Agent': 'Mozilla/5.0'}, timeout=5)
page_content = BeautifulSoup(page_response.content, "html.parser")
reviews_stars_for_page = page_content.find_all("div", class_="review-item-stars")
for review_stars in reviews_stars_for_page:
all_review_stars.append(review_stars.find("div", class_="avg-stars")['data-stars'])
print(f"Got stars for page {page_num}")
print(f"Retrived the stars given from {len(all_review_stars)} reviews")
all_review_stars = list(map(int, all_review_stars))
print(all_review_stars)

if __name__ == '__main__':
main()

输出:

Got stars for page 1
Got stars for page 2
Got stars for page 3
Got stars for page 4
Got stars for page 5
Got stars for page 6
Got stars for page 7
Got stars for page 8
Got stars for page 9
Got stars for page 10
Retrived the stars given from 94 reviews
[5, 5, 5, 4, 5, 5, 5, 4, 3, 5, 3, 5, 5, 5, 5, 5, 4, 5, 5, 4, 5, 5, 5, 5, 3, 5, 5, 4, 5, 5, 4, 2, 5, 5, 3, 5, 5, 4, 5, 5, 5, 5, 5, 4, 3, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 3, 4, 4, 4, 2, 5, 4, 5, 5, 5, 4, 5, 5, 5, 5, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 3, 5, 4, 4, 5, 5, 4, 5]

关于python - 打印 Python Beautiful Soup 的 div 类中内容的值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55291181/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com