gpt4 book ai didi

python - 如何使用 BeautifulSoup 在 Python 中获取特定标签属性文本?

转载 作者:行者123 更新时间:2023-11-30 22:56:08 26 4
gpt4 key购买 nike

我正在使用 BS4 用 Python 编写一个小抓取工具,以便从 ESPN.com 获取 MLB 赛程数据

快要完成了,但我遇到了一个小问题:

snippet

<div class="teams" data-behavior="fix_broken_images"><a name="&amp;lpos=mlb:schedule:team" href="/mlb/team/_/name/kc"><img src="http://a.espncdn.com/combiner/i?img=/i/teamlogos/mlb/500/scoreboard/kc.png&amp;h=50" class="schedule-team-logo"></a></div><a name="&amp;lpos=mlb:schedule:team" class="team-name" href="/mlb/team/_/name/kc"><span>Kansas City</span> <abbr title="Kansas City Royals">KC</abbr></a>

我实际上可以阅读<span> </span>内容,但我想在 <abbr title> 中获取完整的团队名称

不知道我错过了什么,我还不知道该怎么做

谢谢!

最佳答案

对于您的代码片段,您需要来自 anchorabbr 标记的 title 属性,其中包含 team-name 类:

h = """<div class="teams" data-behavior="fix_broken_images"><a name="&amp;lpos=mlb:schedule:team" href="/mlb/team/_/name/kc"><img src="http://a.espncdn.com/combiner/i?img=/i/teamlogos/mlb/500/scoreboard/kc.png&amp;h=50" class="schedule-team-logo"></a></div><a name="&amp;lpos=mlb:schedule:team" class="team-name" href="/mlb/team/_/name/kc"><span>Kansas City</span> <abbr title="Kansas City Royals">KC</abbr></a>"""


soup = BeautifulSoup(h)

print(soup.select_one("a.team-name abbr")["title"])

这给你:

 Kansas City Royals

或者使用查找:

h = """<div class="teams" data-behavior="fix_broken_images"><a name="&amp;lpos=mlb:schedule:team" href="/mlb/team/_/name/kc"><img src="http://a.espncdn.com/combiner/i?img=/i/teamlogos/mlb/500/scoreboard/kc.png&amp;h=50" class="schedule-team-logo"></a></div><a name="&amp;lpos=mlb:schedule:team" class="team-name" href="/mlb/team/_/name/kc"><span>Kansas City</span> <abbr title="Kansas City Royals">KC</abbr></a>"""

soup = BeautifulSoup(h)

print(soup.find("a", attrs={"class":"team-name"}).abbr["title"])

这将从站点获取所有名称:

from bs4 import BeautifulSoup
import requests
url = "http://espn.go.com/mlb/schedule"

soup = BeautifulSoup(requests.get(url).content)

table = soup.select_one("table.schedule.has-team-logos")

print([a["title"] for a in table.select("a.team-name abbr")])

输出:

['Detroit Tigers', 'Washington Nationals', 'Kansas City Royals', 'New York Yankees', 'Oakland Athletics', 'Boston Red Sox', 'Pittsburgh Pirates', 'Cincinnati Reds', 'Milwaukee Brewers', 'Miami Marlins', 'Chicago White Sox', 'Texas Rangers', 'San Diego Padres', 'Chicago Cubs', 'Baltimore Orioles', 'Minnesota Twins', 'Cleveland Indians', 'Houston Astros', 'Arizona Diamondbacks', 'Colorado Rockies', 'Tampa Bay Rays', 'Seattle Mariners', 'New York Mets', 'Los Angeles Dodgers', 'Toronto Blue Jays', 'San Francisco Giants']

关于python - 如何使用 BeautifulSoup 在 Python 中获取特定标签属性文本?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37133309/

26 4 0