python - 无法访问 BeautifulSoup 中的表标签——显示为声明而不是标签-6ren

python - 无法访问 BeautifulSoup 中的表标签——显示为声明而不是标签

转载作者：太空宇宙更新时间：2023-11-04 04:25:40

25

4

我正在使用 Jupyter Notebook 运行 Python3。我正在尝试在 this page 上选择具有类属性“公司”的表行标签，但是在汤中的某个点之后无法选择任何标签。当我运行 findAll 时，结果是一个空列表。包含该表的 soup 索引是 21，但是显示为 bs4.element.Declaration 而不是标记，这可能就是 findAll 不返回任何内容的原因。

from bs4 import BeautifulSoup as bs  
import requests
url = 'http://theacsi.org/index.php?option=com_content&view=article&id=149&catid=&Itemid=214&i=Airlines'
r = requests.get(url, headers={
            'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36'
        })
airlinesSatPage = r.content       
soup = bs(airlinesSatPage, "html.parser")
allRows = soup.findAll('tr',{'class':'Company'})
print(allRows)

知道发生了什么事或我可以做什么来访问这些标签吗？

最佳答案

问题似乎是 html.parser 无法处理从该 URL 返回的 HTML 标记。切换到 lxml 解析器可以解决问题，但这确实需要单独的 pip install lxml。

总而言之，首先:

pip install lxml

然后更改代码中的解析器:

soup = bs(airlinesSatPage, "lxml")

运行时，打印:

[<tr class="Company"><td class="Company"> <a href="https://www.theacsi.org..., ]

关于python - 无法访问 BeautifulSoup 中的表标签——显示为声明而不是标签，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/53561330/

25

4

0

文章推荐： python - python中的字符串日期转换

文章推荐： c - 将多维可变长度数组传递给函数

文章推荐：在 C 中构建自定义矩阵

文章推荐： python - 从 ctypes.Union 派生的 Monkey Patching 类不起作用

首页

博学

6Ren·AI

商城