gpt4 book ai didi

Python Beautifulsoup4网站解析

转载 作者:太空宇宙 更新时间:2023-11-03 15:15:22 26 4
gpt4 key购买 nike

我正在尝试使用 Beautifulsoup4 从网站上抓取一些体育数据,但在弄清楚如何继续操作时遇到了一些问题。我对 HTML 不是很好,而且似乎无法弄清楚最后一点必要的语法。解析数据后,我将把它插入 Pandas 数据框。我正在尝试提取主队、客队和得分。到目前为止,这是我的代码:

from bs4 import BeautifulSoup
import urllib2
import csv

url = 'http://www.bbc.com/sport/football/premier-league/results'
page = urllib2.urlopen(url).read()
soup = BeautifulSoup(page)

def has_class_but_no_id(tag):
return tag.has_attr('score')

writer = csv.writer(open("webScraper.csv", "w"))

for tag in soup.find_all('span', {'class':['team-away', 'team-home', 'score']}):
print(tag)

这是一个示例输出:

<span class="team-home teams">
<a href="/sport/football/teams/newcastle-united">Newcastle</a> </span>
<span class="score"> <abbr title="Score"> 0-3 </abbr> </span>
<span class="team-away teams">
<a href="/sport/football/teams/sunderland">Sunderland</a> </span>

我需要将主队(纽卡斯尔)、比分 (0-3) 和客队 (桑德兰) 存储在三个不同的字段中。本质上,我一直试图从每个标签中提取“值”,而且似乎无法弄清楚 bs4 中的语法。我需要一个 tag.value 属性,但我在文档中找到的只是一个 tag.nametag.attrs。非常感谢任何帮助或指点!

最佳答案

每个分数单元都位于 <td class='match-details'> 中元素,遍历这些以提取匹配详细信息。

从那里,您可以使用 .stripped_strings 从子元素中提取文本发电机;只需将其传递给 ''.join()获取标记中包含的所有字符串。选择 team-home , scoreteam-away分开以便于解析:

for match in soup.find_all('td', class_='match-details'):
home_tag = match.find('span', class_='team-home')
home = home_tag and ''.join(home_tag.stripped_strings)
score_tag = match.find('span', class_='score')
score = score_tag and ''.join(score_tag.stripped_strings)
away_tag = match.find('span', class_='team-away')
away = away_tag and ''.join(away_tag.stripped_strings)

还有一个 print这给出:

>>> for match in soup.find_all('td', class_='match-details'):
... home_tag = match.find('span', class_='team-home')
... home = home_tag and ''.join(home_tag.stripped_strings)
... score_tag = match.find('span', class_='score')
... score = score_tag and ''.join(score_tag.stripped_strings)
... away_tag = match.find('span', class_='team-away')
... away = away_tag and ''.join(away_tag.stripped_strings)
... if home and score and away:
... print home, score, away
...
Newcastle 0-3 Sunderland
West Ham 2-0 Swansea
Cardiff 2-1 Norwich
Everton 2-1 Aston Villa
Fulham 0-3 Southampton
Hull 1-1 Tottenham
Stoke 2-1 Man Utd
Aston Villa 4-3 West Brom
Chelsea 0-0 West Ham
Sunderland 1-0 Stoke
Tottenham 1-5 Man City
Man Utd 2-0 Cardiff
# etc. etc. etc.

关于Python Beautifulsoup4网站解析,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/21501949/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com