gpt4 book ai didi

python - 使用 BeautifulSoup/Python 遍历 DOM

转载 作者:太空宇宙 更新时间:2023-11-04 16:30:10 26 4
gpt4 key购买 nike

我有这个 DOM:

<h2>Main Section</h2>
<p>Bla bla bla<p>
<h3>Subsection</h3>
<p>Some more info</p>

<h3>Subsection 2</h3>
<p>Even more info!</p>


<h2>Main Section 2</h2>
<p>bla</p>
<h3>Subsection</h3>
<p>Some more info</p>

<h3>Subsection 2</h3>
<p>Even more info!</p>

我想生成一个迭代器,返回“Main Section”、“Bla bla bla”、“Subsection”等。BeautifulSoup 有办法做到这一点吗?

最佳答案

这是一种方法。这个想法是迭代主要部分(h2 标签),并且对于每个 h2 标签迭代 sibling 直到下一个 h2 标签:

from bs4 import BeautifulSoup, Tag


data = """<h2>Main Section</h2>
<p>Bla bla bla<p>
<h3>Subsection</h3>
<p>Some more info</p>

<h3>Subsection 2</h3>
<p>Even more info!</p>


<h2>Main Section 2</h2>
<p>bla</p>
<h3>Subsection</h3>
<p>Some more info</p>

<h3>Subsection 2</h3>
<p>Even more info!</p>"""


soup = BeautifulSoup(data)
for main_section in soup.find_all('h2'):
for sibling in main_section.next_siblings:
if not isinstance(sibling, Tag):
continue
if sibling.name == 'h2':
break
print sibling.text
print "-------"

打印:

Bla bla bla


Subsection
Some more info
Subsection 2
Even more info!
-------
bla
Subsection
Some more info
Subsection 2
Even more info!
-------

希望对您有所帮助。

关于python - 使用 BeautifulSoup/Python 遍历 DOM,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/22496401/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com