gpt4 book ai didi

Python Beautiful Soup 找不到表

转载 作者:搜寻专家 更新时间:2023-10-31 23:19:38 25 4
gpt4 key购买 nike

我有一个无法找到表格的网络抓取代码。我的代码如下所示:

site = 'http://etfdb.com/compare/market-cap/'
hdr = {'User-Agent': 'Mozilla/5.0'}
req = Request(site, headers=hdr)
page = urlopen(req)
soup = BeautifulSoup(page)
table = soup.find('table', {"class":"table mm-mobile-table table-striped
table-bordered"})

表格 HTML 看起来像:

<table class="table mm-mobile-table table-striped table-bordered" data-
icons-prefix="fa" data-icons="{&quot;columns&quot;:&quot;fa-th&quot;}" data-
striped="true" data-toggle="table">

但出于某种原因,我的代码总是返回无表。我不知道为什么,但任何帮助将不胜感激。谢谢。

最佳答案

问题是有不正确的标记使得大部分代码被注释掉,即

<!-->. 

解决方法是替换这些元素然后解析 HTML。

from urllib2 import urlopen, Request
from bs4 import BeautifulSoup
site = 'http://etfdb.com/compare/market-cap/'
hdr = {'User-Agent': 'Mozilla/5.0'}
req = Request(site, headers=hdr)
res = urlopen(req)
rawpage = res.read()
page = rawpage.replace("<!-->", "")
soup = BeautifulSoup(page, "html.parser")
table = soup.find("table", {"class":"table mm-mobile-table table-striped table-bordered"})
print (table)

在 Python 2.7.12 上测试

from urllib.request import urlopen, Request
from bs4 import BeautifulSoup
site = 'http://etfdb.com/compare/market-cap/'
hdr = {'User-Agent': 'Mozilla/5.0'}
req = Request(site, headers=hdr)
res = urlopen(req)
rawpage = res.read().decode("utf-8")
page = rawpage.replace('<!-->', '')
soup = BeautifulSoup(page, "html.parser")
table = soup.find("table", {"class":"table mm-mobile-table table-striped table-bordered"})
print (table)

在 Python 3.5.2 上测试

给予:

 <table class="table mm-mobile-table table-striped table-bordered" data-icons='{"columns":"fa-th"}' data-icons-prefix="fa" data-striped="true" data-toggle="table"><thead><tr><th class="show-td" data-field="symbol">Symbol</th> <th class="show-td" data-field="name">Name</th> <th class="show-td" data-field="aum">AUM</th> <th class="show-td" data-field="avg-volume">Avg Volume</th></tr></thead><tbody><tr><td class="show-td" data-th="Symbol"><a href="/etf/SPY/">SPY</a></td> <td class="show-td" data-th="Name"><a href="/etf/SPY/">SPDR S&amp;P 500 ETF</a></td> <td class="show-td" data-th="AUM">$236,737,519.17</td> <td class="show-td" data-th="Avg Volume">73,039,883</td></tr> <tr><td class="show-td" data-th="Symbol"><a href="/etf/IVV/">IVV</a></td> <td class="show-td" data-th="Name"><a href="/etf/IVV/">iShares Core S&amp;P 500 ETF</a></td> <td class="show-td" data-th="AUM">$115,791,603.10</td> <td class="show-td" data-th="Avg Volume">3,502,931</td></tr> ...

关于Python Beautiful Soup 找不到表,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44893165/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com