gpt4 book ai didi

python-3.x - 使用 BeautifulSoup 从篮球引用中提取表格时出现问题

转载 作者:行者123 更新时间:2023-11-30 09:40:43 27 4
gpt4 key购买 nike

我想提取一个 id =“all_team-stats-per_game”的特定表。我正在尝试提取列标题。我能够正确找到具有特定 id 的表,但不确定为什么当我搜索标签“tr”时输出为空。代码附在下面。提前致谢。

from urllib.request import urlopen
from bs4 import BeautifulSoup
import pandas as pd

# NBA season we will be analyzing
year = 2019

url = "https://www.basketball-reference.com/leagues/NBA_2019.html"

# this is the HTML from the given URL
html = urlopen(url)
soup = BeautifulSoup(html, features="html.parser")

# use findALL() to get the column headers
# soup.findAll('tr', limit=2)

soup = soup.find(id="all_team-stats-per_game")

print(soup.find_all('th'))
#
# headers = [th.getText() for th in soup[0].findAll('th')]
#
# print(headers)

最佳答案

我尝试编辑您的代码。我能够找到所需的 div 标签,但其中的表格被作为注释提及,我也使用检查工具对其进行了验证。所以也许这就是它没有获取表格内容的原因

from urllib.request import urlopen
from bs4 import BeautifulSoup
import pandas as pd

# NBA season we will be analyzing
year = 2019

url = "https://www.basketball-reference.com/leagues/NBA_2019.html"

# this is the HTML from the given URL
html = urlopen(url)
soup = BeautifulSoup(html, features="html.parser")

# use findALL() to get the column headers
# soup.findAll('tr', limit=2)

target_div = soup.find("div", {"id": "all_team-stats-per_game"})

print(target_div.prettify())
#
# headers = [th.getText() for th in soup[0].findAll('th')]
#
# print(headers)

关于python-3.x - 使用 BeautifulSoup 从篮球引用中提取表格时出现问题,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58871620/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com