gpt4 book ai didi

python - 如何使用 Beautiful Soup 忽略空标签?

转载 作者:太空宇宙 更新时间:2023-11-03 16:53:10 28 4
gpt4 key购买 nike

我有这个网页:

text = BeautifulSoup(requests.get('https://www.washingtonpost.com/blogs/on-small-business/post/how-to-breed-big-innovation-inside-a-small-business/2013/03/26/b1a8953e-962a-11e2-9e23-09dce87f75a1_blog.html', timeout=7.00).text)

我有一个漂亮的汤功能,可以拉所有<ul>没有属性且带有 <li> 的标签不包含属性且不带 <a> 的标签标记 child :

def pull_ul(tag):
return tag.name == 'ul' and not tag.attrs and not tag.li.attrs and not tag.a
ul_tags = text.find_all(pull_ul)
print ul_tags

当我运行此程序时,我收到一条错误消息:

AttributeError: 'NoneType' object has no attribute 'attrs'

所以我将函数修改为:

def pull_ul(tag):
return tag.name == 'ul' and not tag.attrs and not tag.a

输出:

[<ul></ul>, <ul> <li class="report-button" id="flag-spam">Spam</li> <li class="report-button" id="flag-offensive">Offensive</li> <li class="report-button" id="flag-disagree">Disagree</li> <li class="report-button" id="flag-offtopic">Off-Topic</li> </ul>]

这告诉我生成错误的部分是空标签 <ul></ul>

有没有办法重写该函数,使其忽略导致程序出错的所有空标签实例?

最佳答案

如果您只是添加一个额外的检查 tag.li 是否为真会怎样:

def pull_ul(tag):
return tag.name == 'ul' and \
not tag.attrs and \
tag.li and \ # < HERE
not tag.li.attrs and \
not tag.a

关于python - 如何使用 Beautiful Soup 忽略空标签?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35678699/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com