gpt4 book ai didi

python - beautifulsoup:无法在一个循环中提取所有元素

转载 作者:太空宇宙 更新时间:2023-11-03 18:07:13 27 4
gpt4 key购买 nike

代码:

from bs4 import BeautifulSoup
soup = BeautifulSoup('<div><p>p_string</p><div>div_string</div></div>')
for m in soup.div:
print "extract(first loop): ", m.extract()
print "current soup.div(frist loop): ", soup.div #it contains another div block
print '___________________________________________________________'

#I have to do another for loop to purge the remaining div block, why?
for m in soup.div:
print "extract(second loop): ", m.extract()

print "current soup.div(second loop): ", soup.div #removed

结果:

extract(first loop):  <p>p_string</p>
current soup.div(frist loop): <div><div>div_string</div></div>
___________________________________________________________
extract(second loop): <div>div_string</div>
current soup.div(second loop): <div></div>

为什么它没有在第一个 for 循环中提取所有元素(pdiv)?

最佳答案

这是因为您在循环中调用 extract() ,该循环从树中删除标签 - 在迭代标签的子项时删除它们。与iterating over the list and remove items from it in the loop基本相同。

相反,请使用.find_all() :

for m in soup.div.find_all():
print m.extract()

关于python - beautifulsoup:无法在一个循环中提取所有元素,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/26646653/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com