gpt4 book ai didi

Python Beautiful Soup find_all

转载 作者:行者123 更新时间:2023-12-05 07:08:16 28 4
gpt4 key购买 nike

您好,我正在尝试从网站获取一些信息。请原谅我,如果我的格式有任何错误,这是我第一次发布到 SO。

soup.find('div', {"class":"stars"}) 

从这里我收到

<div class="stars" title="4.0 star rating">
<i class="star star--large star-0"></i><i class="star star--large star-
1"></i><i class="star star--large star-2"></i><i class="star star--large
star-3"></i><i class="star star--large star-4 star--large--muted"></i>
</div>

我需要 “4.0 星级”

当我使用时:

soup.find('div', {"class":"stars"})["title"]

它有效,但不适用于 find_all。但我试图找到所有案例并将它们放入列表中。

下面是我的完整代码。

    def get_info():
from IPython.display import HTML
import requests
from bs4 import BeautifulSoup
n = 1
for page in range(53):
url = f"https://www.sitejabber.com/reviews/apple.com?page=
{n}&sort=Reviews.processed&direction=DESC#reviews"
r = requests.get(url)
soup = BeautifulSoup(r.text, 'lxml')
all_reviews = soup.find_all('div', {'class':"truncate_review"})
all_dates = soup.find_all('div', {'class':'review__date'},'title')
all_titles = soup.find_all('span', {'class':'review__title__text'})
reviews_class = soup.find('div', {"class":"review__stars"})
for review in all_reviews:

all_reviews_list.append(review.text.replace("\n","").replace("\t",""))
for date in all_dates:

all_dates_list.append(date.text.replace("\n","").replace("\t",""))
for title in all_titles:

all_titles_list.append(title.text.replace("\n","").replace("\t",""))
for stars in reviews_class.find_all('div', {'class':'stars'}):
all_star_ratings.append(stars['title'])



n += 1

抱歉,我的缩进有点乱,但这是我的完整代码。

最佳答案

像在字典中一样遍历 bs4 元素。
如果您正在使用 find():

soup.find('div', {"class":"stars"}) ['title']

这是有效的,因为 find() 返回一个值。
但是,如果您使用的是 find_all(),它会返回一个列表,而 list[string] 是一个无效的过程。
因此,您可以创建一个列表:

res = []
for i in soup.find_all('div', {"class":"stars"}):
res.append(i['title'])

否则,作为单行:

res = [i['title'] for i in soup.find_all('div', {"class":"stars"})]

由于要review的所有标题,所以需要指定review容器,即scrape from:

<div class="review__container">

所以代码将是:

review = soup.find_all('div',class_="review__container")
res = [i['title'] for j in review for i in j.find_all('div',class_='stars')]

给出:

['1.0 star rating', '1.0 star rating', '3.0 star rating', '1.0 star rating', '1.0 star rating', '1.0 star rating', '1.0 star rating', '1.0 star rating', '1.0 star rating', '1.0 star rating', '5.0 star rating', '1.0 star rating', '1.0 star rating', '1.0 star rating', '1.0 star rating', '1.0 star rating', '1.0 star rating', '5.0 star rating', '2.0 star rating', '5.0 star rating', '1.0 star rating', '2.0 star rating', '1.0 star rating', '5.0 star rating', '1.0 star rating', '5.0 star rating']

关于Python Beautiful Soup find_all,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61904528/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com