gpt4 book ai didi

python - 当您有多个具有相同名称的类时,在 python 中抓取 html

转载 作者:行者123 更新时间:2023-11-28 01:24:44 25 4
gpt4 key购买 nike

也许我的术语在这里有点偏离,但希望你能理解。我正在尝试从具有三个评级的食品评论网站上抓取数据:快乐、中立、不快乐。网站中每个计数的数量写成:

<div class="col  PL20">
<div class="sprite-sr2-face-smile1"></div>
<div class="sr2_score_l">25</div>
</div>
<div class="col MR20 MT20 ML20">
<div class="sprite-sr2-face-ok2 MT20"></div>
<div class="sr2_score_m">17</div>
</div>
<div class="col ML10 MT20">
<div class="sprite-sr2-face-cry2 MT20"></div>
<div class="sr2_score_m">2</div>
</div>

所以在这种情况下,快乐计数的数量是 25,中立计数是 17,不快乐计数是 2。问题是我下面的 python 代码我无法区分中立计数和不快乐计数,因为共享同一个类,是有办法解决这个问题吗?

# using BeautifulSoup4 and lxml
import urllib2
from bs4 import BeautifulSoup
soup = BeautifulSoup(urllib2.urlopen('http://www.openrice.com/_
en/hongkong/restaurant/central-open-kitchen/136799').read())

happy = soup.find('div', attrs={'class': 'sr2_score_l'})
print "happy rating, " + happy.string

neutral = soup.find('div', attrs={'class': 'sr2_score_m'})
print "neutral rating, " + neutral.string

unhappy = soup.find('div', attrs={'class': 'sr2_score_m'})
print "neutral rating, " + neutral.string

最佳答案

face-smileface-okface-cry 部分类名是您的指标:

happy = soup.find("div", class_=re.compile(r"face-smile")).find_next_sibling("div").text
ok = soup.find("div", class_=re.compile(r"face-ok")).find_next_sibling("div").text
unhappy = soup.find("div", class_=re.compile(r"face-cry")).find_next_sibling("div").text

示例代码(具有很好的可重用功能):

import re

from bs4 import BeautifulSoup


def print_reviews_count(soup):
indicators = {
"happy": "face-smile",
"ok": "face-ok",
"unhappy": "face-cry",
}

for key, class_name in indicators.iteritems():
count = soup.find("div", class_=re.compile(class_name)).find_next_sibling("div").text
print(key, count)


source_code = """
<div class="col PL20">
<div class="sprite-sr2-face-smile1"></div>
<div class="sr2_score_l">25</div>
</div>
<div class="col MR20 MT20 ML20">
<div class="sprite-sr2-face-ok2 MT20"></div>
<div class="sr2_score_m">17</div>
</div>
<div class="col ML10 MT20">
<div class="sprite-sr2-face-cry2 MT20"></div>
<div class="sr2_score_m">2</div>
</div>
"""

soup = BeautifulSoup(source_code, "lxml")
print_reviews_count(soup)

打印:

('ok', u'17')
('unhappy', u'2')
('happy', u'25')

关于python - 当您有多个具有相同名称的类时,在 python 中抓取 html,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/32568443/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com