gpt4 book ai didi

python - 使用 beautifulsoup 从 HTML 获取链接文本

转载 作者:可可西里 更新时间:2023-11-01 13:32:24 26 4
gpt4 key购买 nike

在下面的 HTML 示例部分中,我使用 beautifilsoup 从页面中提取了一堆足球比分,非常简单:

<tr class='report' id='match-row-EFBO695086'> <td class='statistics show' title='Show latest      match stats'> <button>Show</button> </td>  <td class='match-competition'> Premier League  </td>  <td class='match-details
teams'> <p> <span class='team-home teams'> <a href='/sport/football/teams/manchester-city'>Man City</a> </span> <span class='score'> <abbr title='Score'> 1-0 </abbr> </span> <span class='team-away teams'> <a
href='/sport/football/teams/crystal-palace'>Crystal Palace</a> </span> </p> </td> <td class="match-date"> Sat 28 Dec </td> <td class='time'> Full time </td> <td class='status'> <a class='report'
href='/sport/football/25474625'>Report</a>

from bs4 import BeautifulSoup
import urllib.request
import csv

url = 'http://www.bbc.co.uk/sport/football/teams/manchester-city/results/'
page = urllib.request.urlopen(url)
soup = BeautifulSoup(page)

for score in soup.findAll('abbr'):
print(score.string)

*** Remote Interpreter Reinitialized ***
>>>
None
1-2
1-0
0-2
2-1
2-2
4-1
0-2
1-1

如何从这部分 HTML 中提取团队名称:

<span class='team-away teams'> <a href='/sport/football/teams/crystal-palace'>Crystal Palace</a>    </span> 

最佳答案

想法是首先获取包含每个游戏信息的元素 - 这些是带有 class="report"tr 标签。对于每一行,按类别 team-hometeam-away 获取团队名称,并按标签名称 abbr 得分:

from bs4 import BeautifulSoup
import urllib.request

url = 'http://www.bbc.co.uk/sport/football/teams/manchester-city/results/'
page = urllib.request.urlopen(url)
soup = BeautifulSoup(page)

for match in soup.select('table.table-stats tr.report'):
team1 = match.find('span', class_='team-home')
team2 = match.find('span', class_='team-away')
score = match.abbr
if not all((team1, team2, score)):
continue

print(team1.text, score.text, team2.text)

打印:

Man City   1-2   CSKA 
Man City 1-0 Man Utd
Man City 0-2 Newcastle
West Ham 2-1 Man City
...

仅供引用,table.table-stats tr.reportCSS Selector匹配 tableclass="table-stats" 内的所有 tr 标签和 class="report" .

关于python - 使用 beautifulsoup 从 HTML 获取链接文本,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/26784976/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com