gpt4 book ai didi

python - 从 html 表中抓取数据,选择具有某些属性的行

转载 作者:行者123 更新时间:2023-12-01 09:34:33 25 4
gpt4 key购买 nike

我正在从以下网站抓取信息:“http://www.mobygames.com/game/wheelman/view-moby-score”。这是我的代码

url_credit = "http://www.mobygames.com/game/wheelman/view-moby-score"
response = requests.get(url_credit, headers=headers)
soup = BeautifulSoup(response.text, "lxml")
table = soup.find("table", class_="reviewList table table-striped table-condensed table-hover").select('tr[valign="top"]')
for row in table[1:]:
print(row)
x = soup.select('td[class="left"]').get("colspan")

我想要的输出是这样的:

platform     total_votes rating_category score  total_score
PlayStation3 None None None None
Windows 6 Acting 4.2 4.1
Windows 6 AI 3.7 4.1
Windows 6 Gameplay 4.0 4.1

主要问题是在平台列上有平台名称以进行相应的观察。我怎样才能得到它?

最佳答案

您可以看到有新平台的行有 3 列,而其他行有 2 列。您可以使用它来更改平台。

您可以看到像 PlayStation 这样的行有一列(<td> 标签),其中包含 colspan="2" class="center"属性。用它来处理像 PlayStation 这样的情况。

代码:

url_credit = "http://www.mobygames.com/game/wheelman/view-moby-score"
response = requests.get(url_credit, headers=headers)
soup = BeautifulSoup(response.text, "lxml")
table = soup.find("table", class_="reviewList table table-striped table-condensed table-hover").select('tr[valign="top"]')

platform = ''
total_votes, total_score = None, None
for row in table[1:]:
# handle cases like playstation
if row.find('td', colspan='2', class_='center'):
platform = row.find('td').text
total_score, total_votes = None, None
print('{} | {} | {} | {} | {}'.format(platform, total_votes, None, None, total_score))
continue

cols = row.find_all('td')
if len(cols) == 3:
platform = cols[0].text
total_votes = cols[1].text
total_score = cols[2].text
continue
print('{} | {} | {} | {} | {}'.format(platform, total_votes, cols[0].text, cols[1].text, total_score))

输出:

PlayStation 3 | None | None | None | None
Windows | 6 |       Acting | 4.2 | 4.1
Windows | 6 |       AI | 3.7 | 4.1
Windows | 6 |       Gameplay | 4.0 | 4.1
Windows | 6 |       Graphics | 4.2 | 4.1
Windows | 6 |       Personal Slant | 4.3 | 4.1
Windows | 6 |       Sound / Music | 4.3 | 4.1
Windows | 6 |       Story / Presentation | 3.8 | 4.1
Xbox 360 | 5 |       Acting | 3.8 | 3.5
Xbox 360 | 5 |       AI | 3.2 | 3.5
Xbox 360 | 5 |       Gameplay | 3.4 | 3.5
Xbox 360 | 5 |       Graphics | 3.6 | 3.5
Xbox 360 | 5 |       Personal Slant | 3.6 | 3.5
Xbox 360 | 5 |       Sound / Music | 3.4 | 3.5
Xbox 360 | 5 |       Story / Presentation | 3.8 | 3.5

注意:通过打印,我的意思是将这些值保存在您正在使用的任何列表/数据帧中。我只是使用print()展示如何更改 platform在需要时可变。

关于python - 从 html 表中抓取数据,选择具有某些属性的行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49651142/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com