gpt4 book ai didi

python - 如何在python中使用beautifulsoup提取标签之间的文本

转载 作者:太空宇宙 更新时间:2023-11-04 02:57:03 25 4
gpt4 key购买 nike

我正在尝试从以下 html 结构中提取文本:

<div class= "story-body story-content">
<p>
<br>
"the text I want to get"
<a href= "http://...>
<br>
"the text I want to get"
<a href="http:// ... >
.
.

我已经提取了超链接,但我不知道如何提取文本。到目前为止我试过:

names = []
for div in soup3.find_all("div", attrs={"class" : "story-body story-content"}):
for t in div.find_all('br'):
t = t.get_text()
names.append(t)

但我只得到:

[<br/>, <br/>, <br/>, <br/>, <br/>, <br/>, <br/>, <br/>, <br/>, <br/>, <br/>, <br/>, <br/>, <br/>, <br/>, <br/>, <br/>, <br/>, <br/>, <br/>, <br/>, <br/>, <br/>, <br/>, <br/>, <br/>, <br/>, <br/>, <br/>, <br/>, <br/>, <br/>, <br/>, <br/>, <br/>, <br/>, <br/>, <br/>, <br/>, <br/>, <br/>, <br/>, <br/>, <br/>, <br/>, <br/>, <br/>, <br/>, <br/>, <br/>, <br/>, <br/>, <br/>, <br/>, <br/>, <br/>, <br/>, <br/>, <br/>, <br/>, <br/>, <br/>, <br/>, <br/>, <br/>, <br/>, <br/>, <br/>, <br/>, <br/>, <br/>, <br/>, <br/>, <br/>, <br/>, <br/>, u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'']

最佳答案

for div in soup3.find_all("div", attrs={"class" : "story-body story-content"}):
text_list = [text for text in div.stripped_strings]

使用stripped_string获取标签下的所有非空字符串

<br>标签插入一个换行符。它不包含任何文本。

关于python - 如何在python中使用beautifulsoup提取标签之间的文本,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42051539/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com