gpt4 book ai didi

python - BeautifulSoup 返回 None 即使该元素存在

转载 作者:行者123 更新时间:2023-11-28 16:21:09 25 4
gpt4 key购买 nike

我已经了解了针对类似问题的大部分解决方案,但还没有找到一个有效的解决方案,更重要的是,还没有找到解释为什么会在 Javascript 或正在抓取的网站上调用其他内容时发生这种情况.

我正在尝试从网站上抓取游戏“Officials”的表格: http://www.pro-football-reference.com/boxscores/201309050den.htm

我的代码是:

url = "http://www.pro-football-reference.com/boxscores/201309050den.htm"
html = urlopen(url)
bsObj = BeautifulSoup(html, "lxml")
officials = bsObj.findAll("table",{"id":"officials"})

for entry in officials:
print(str(entry))

我现在只是打印到控制台,但是使用 findAll 或 None 得到一个空列表。我也用基本的 html.parser 尝试过这个,但没有成功。

谁能更好地理解 html,具体告诉我这个网页有什么不同?提前致谢!

最佳答案

试试这段代码:

from selenium import webdriver
import time
from bs4 import BeautifulSoup


driver = webdriver.Chrome()
url= "http://www.pro-football-reference.com/boxscores/201309050den.htm"
driver.maximize_window()
driver.get(url)

time.sleep(5)
content = driver.page_source.encode('utf-8').strip()
soup = BeautifulSoup(content,"html.parser")
officials = soup.findAll("table",{"id":"officials"})

for entry in officials:
print(str(entry))


driver.quit()

它将打印:

<table class="suppress_all sortable stats_table now_sortable" data-cols-to-freeze="0" id="officials"><thead><tr class="thead onecell"><td class=" center" colspan="2" data-stat="onecell">Officials</td></tr></thead><caption>Officials Table</caption><tbody>
<tr data-row="0"><th class=" " data-stat="ref_pos" scope="row">Referee</th><td class=" " data-stat="name"><a href="/officials/ColeWa0r.htm">Walt Coleman</a></td></tr>
<tr data-row="1"><th class=" " data-stat="ref_pos" scope="row">Umpire</th><td class=" " data-stat="name"><a href="/officials/ElliRo0r.htm">Roy Ellison</a></td></tr>
<tr data-row="2"><th class=" " data-stat="ref_pos" scope="row">Head Linesman</th><td class=" " data-stat="name"><a href="/officials/BergJe1r.htm">Jerry Bergman</a></td></tr>
<tr data-row="3"><th class=" " data-stat="ref_pos" scope="row">Field Judge</th><td class=" " data-stat="name"><a href="/officials/GautGr0r.htm">Greg Gautreaux</a></td></tr>
<tr data-row="4"><th class=" " data-stat="ref_pos" scope="row">Back Judge</th><td class=" " data-stat="name"><a href="/officials/YettGr0r.htm">Greg Yette</a></td></tr>
<tr data-row="5"><th class=" " data-stat="ref_pos" scope="row">Side Judge</th><td class=" " data-stat="name"><a href="/officials/PattRi0r.htm">Rick Patterson</a></td></tr>
<tr data-row="6"><th class=" " data-stat="ref_pos" scope="row">Line Judge</th><td class=" " data-stat="name"><a href="/officials/BaynRu0r.htm">Rusty Baynes</a></td></tr>
</tbody></table>

关于python - BeautifulSoup 返回 None 即使该元素存在,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40146128/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com