gpt4 book ai didi

python - 如何使用python'beautiful soup获取标签之间的内容及其以HTML结尾的内容?

转载 作者:太空宇宙 更新时间:2023-11-04 10:23:10 25 4
gpt4 key购买 nike

我有一个 HTML 行如下:

<span class="cd__headline-text">Is this model too thin for Yves Saint Laurent? </span>

我想提取标题,即“这个模特对 Yves Saint Laurent 来说太瘦了吗?”从这个 HTML 行。如何获取之间的任何内容

<tag> and </tag>.

我不太熟悉正则表达式。

最佳答案

如果您的元素只包含 文本,请使用.string attribute :

headline = soup.find(class_='cd__headline-text')
print(headline.string)

如果包含其他标签,则可以获取当前元素中包含的所有文本并进一步获取,也可以只获取当前元素中的特定文本。

element.get_text() function将递归并收集元素和子元素中的所有字符串,将它们与您选择的字符串(默认为空字符串)连接起来,并进行或不进行空格剥离。

要仅获取特定字符串,您可以遍历 .strings or .stripped_strings generators ,或使用 element contents访问所有包含的元素,然后选择 NavigableString 类型的实例。

使用您的示例进行演示:

>>> from bs4 import BeautifulSoup
>>> markup = '<span class="cd__headline-text">Is this model too thin for Yves Saint Laurent? </span>'
>>> soup = BeautifulSoup(markup)
>>> headline = soup.find(class_='cd__headline-text')
>>> print headline.string
Is this model too thin for Yves Saint Laurent?
>>> print list(headline.strings)
[u'Is this model too thin for Yves Saint Laurent? ']
>>> print list(headline.stripped_strings)
[u'Is this model too thin for Yves Saint Laurent?']
>>> print headline.get_text()
Is this model too thin for Yves Saint Laurent?
>>> print headline.get_text(strip=True)
Is this model too thin for Yves Saint Laurent?

并添加了一个额外的元素:

>>> markup = '<span class="cd__headline-text">Is this model <em>too thin</em> for Yves Saint Laurent? </span>'
>>> soup = BeautifulSoup(markup)
>>> headline = soup.find(class_='cd__headline-text')
>>> headline.string is None
True
>>> print list(headline.strings)
[u'Is this model ', u'too thin', u' for Yves Saint Laurent? ']
>>> print list(headline.stripped_strings)
[u'Is this model', u'too thin', u'for Yves Saint Laurent?']
>>> print headline.get_text()
Is this model too thin for Yves Saint Laurent?
>>> print headline.get_text(' - ', strip=True)
Is this model - too thin - for Yves Saint Laurent?
>>> headline.contents
[u'Is this model ', <em>too thin</em>, u' for Yves Saint Laurent? ']
>>> from bs4 import NavigableString
>>> [el for el in headline.children if isinstance(el, NavigableString)]
[u'Is this model ', u' for Yves Saint Laurent? ']

关于python - 如何使用python'beautiful soup获取标签之间的内容及其以HTML结尾的内容?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31027305/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com