gpt4 book ai didi

python - 使用 beautifulsoup 从结果中删除特定内容

转载 作者:行者123 更新时间:2023-11-30 23:06:54 26 4
gpt4 key购买 nike

def get_description(link):
redditFile = urllib2.urlopen(link)
redditHtml = redditFile.read()
redditFile.close()
soup = BeautifulSoup(redditHtml)
desc = soup.find('div', attrs={'class': 'op_gd14 FL'}).text
return desc

这是从该 html 中提供文本的代码

    <div class="op_gd14 FL">
<p><span class="bigT">P</span>restige Estates Projects Ltd has informed BSE that the 18th Annual General Meeting (AGM) of the Company will be held on September 30, 2015.Source : BSE<br><br>
<a href="../../company-notices/nestleindia/notices/PEP02">Read all announcements in Prestige Estate</a> </p><p> </p>

</div>

这个结果对我来说很好,我只是想排除

的内容

<a href="../../company-notices/nestleindia/notices/PEP02">Read all announcements in Prestige Estate</a>

从结果来看,即desc在我的脚本中,如果存在则忽略,如果不存在则忽略。我怎样才能做到这一点?

最佳答案

您可以使用extract()find() 结果中删除不必要的标签:

descItem = soup.find('div', attrs={'class': 'op_gd14 FL'}) # get the DIV
[s.extract() for s in descItem('a')] # remove <a> tags
return descItem.get_text() # return the text

关于python - 使用 beautifulsoup 从结果中删除特定内容,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/32477042/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com