gpt4 book ai didi

python-3.x - 如何使用 BeautifulSoup 解析嵌套标签

转载 作者:行者123 更新时间:2023-12-02 19:51:36 26 4
gpt4 key购买 nike

HTML 代码

<a href="1.co">1<a href="2.co">2</a></a>

我尝试递归调用 BS 获取第一个标签的“内容”,但 BS 失败

        if hasattr(markup, 'read'):        # It's a file-type object.
> markup = markup.read()
E TypeError: 'NoneType' object is not callable

Python代码

from bs4 import BeautifulSoup
from bs4 import SoupStrainer


def parse(text):
soup = BeautifulSoup(text, parse_only=SoupStrainer(['a']), features="html.parser")
for tag in soup:
if tag.name == "a" and tag.has_attr("href"):
print(tag["href"])
if hasattr(tag, "contents"):
for text in tag.contents:
parse(text)

if __name__ == '__main__':
parse("""<a href="2.co">2<a href="3.co">3</a></a>""")

最佳答案

只需执行 find_all('a')

from bs4 import BeautifulSoup
data='''<a href="1.co">1<a href="2.co">2</a></a>'''
soup=BeautifulSoup(data,'html.parser')
for item in soup.find_all('a',href=True):
print(item['href'])

关于python-3.x - 如何使用 BeautifulSoup 解析嵌套标签,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58062531/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com