gpt4 book ai didi

Python - 无法使用 Beautiful Soup 或 lxml xpath 从网页表中检索数据

转载 作者:太空狗 更新时间:2023-10-30 02:27:16 25 4
gpt4 key购买 nike

我正在尝试从以下网页的“Advanced Box Score Stats”中检索数据:http://www.sports-reference.com/cbb/boxscores/2016-11-11-villanova.html

我尝试以非常广泛的方式使用 BeautifulSoup 来检索所有表格:

import requests
from bs4 import BeautifulSoup

base_url = "http://www.sports-reference.com/cbb/boxscores/2016-11-11-villanova.html"
r = requests.get(base_url)
soup = BeautifulSoup(r.text, "html.parser")
tables = soup.find_all("table")
for table in tables:
print table.get_text()

在这样做时,它只检索了“Basic Box Score Stats”。但是,它并没有像我希望的那样检索“高级框得分统计”。

接下来,我尝试使用 lxml 路径获得更具体的信息:

import requests
from lxml import html
page = requests.get('http://www.sports-reference.com/cbb/boxscores/2016-11-11-villanova.html')
tree = html.fromstring(page.content)
boxscore_Advanced = tree.xpath('//*[@id="box-score-advanced-lafayette"]/tbody/tr[1]/td[1]/text()’)
print boxscore_Advanced

在这样做时,它返回一个空列表。

我已经为此苦苦挣扎了很长时间,并尝试通过使用以下帖子来解决这个问题:

预先感谢您提供的所有帮助!

最佳答案

无需使用 selenium 和/或 PhantomJS“Advanced Box Score Stats”表格实际上在 HTML 中,它们只是在 HTML 注释中。解析它们:

import requests
from bs4 import BeautifulSoup, Comment


url = "http://www.sports-reference.com/cbb/boxscores/2016-11-11-villanova.html"
response = requests.get(url)
soup = BeautifulSoup(response.content, "html.parser")

# find the comments containing the desired tables
tables = soup.find_all(text=lambda text: text and isinstance(text, Comment) and 'Advanced Box Score Stats' in text)

# we have 2 tables - one for an opponent team
for table in tables:
table_soup = BeautifulSoup(table, "html.parser")
advanced_table = table_soup.select_one("table[id^=box-score-advanced]")
for row in advanced_table("tr")[2:]: # skip headers
print(row.th.get_text())
print("-------")

从高级表格的第一列打印玩家姓名:

Nick Lindner
Monty Boykins
Matt Klinewski
Paulius Zalys
Auston Evans
Reserves
Myles Cherry
Kyle Stout
Eric Stafford
Lukas Jarrett
Hunter Janacek
Jimmy Panzini
School Totals
-------
Kris Jenkins
Phil Booth
Josh Hart
Jalen Brunson
Darryl Reynolds
Reserves
Donte DiVincenzo
Mikal Bridges
Eric Paschall
Tim Delaney
Dylan Painter
Denny Grace
Tom Leibig
Matt Kennedy
School Totals
-------

关于Python - 无法使用 Beautiful Soup 或 lxml xpath 从网页表中检索数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41555426/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com