gpt4 book ai didi

Python:在解析 html 代码时跳过行并去除空格

转载 作者:行者123 更新时间:2023-11-28 04:46:30 25 4
gpt4 key购买 nike

我有以下 html 代码:

html_doc = """
<h2> API guidance for developers</h2>
<h2>Images</h2>
<h2>Score descriptors</h2>
<h2>Downloadable XML data files (updated daily)</h2>
<h2>
East Counties</h2>
<h2>
East Midlands</h2>
<h2>
London</h2>
<h2>
North East</h2>
<h2>
North West</h2>
<h2>
South East</h2>
<h2>
South West</h2>
<h2>
West Midlands</h2>
<h2>
Yorkshire and Humberside</h2>
<h2>
Northern Ireland</h2>
<h2>
Scotland</h2>
<h2>
Wales</h2>
"""

如何跳过前四行并访问诸如 East Counties 等文本字符串?

我的尝试不跳过前四行并返回字符串包括代码中嵌入的许多空格(我想去掉):

from bs4 import BeautifulSoup
soup = BeautifulSoup(html_doc, 'html.parser')
for h2 in soup.find_all('h2'):
next
next
next
next
print (str(h2.children.next()))

期望的结果:

East Counties
East Midlands
London
North East
...

我做错了什么?

最佳答案

您可以在此处使用切片,因为find_all 返回一个列表类型,因此您可以使用它的索引,例如[4:]并忽略空格使用 strip()

for h2 in soup.find_all('h2')[4:]:
print(h2.text.strip())

East Counties
East Midlands
London
North East
North West
...

关于Python:在解析 html 代码时跳过行并去除空格,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43911400/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com