gpt4 book ai didi

python - 从
标签内的 标签获取文本结果

转载 作者:行者123 更新时间:2023-11-28 22:25:50 26 4
gpt4 key购买 nike

我正在尝试制作一个网络抓取工具,它会获取以下数据:标题、图片来源、描述和位置。除了位于标记内的位置之外,上述所有工作。

此链接显示了我正在使用的代码:https://pastebin.com/BFZyyhxB

import bs4 as bs
import urllib.request

sauce = urllib.request.urlopen('http://www.manchestereveningnews.co.uk/news/greater-manchester-news').read()
soup = bs.BeautifulSoup(sauce, 'lxml')

title = soup.title
image = soup.image
strong = soup.strong
description = soup.description
location = soup.location


title = soup.find('h1', class_='publication-font', )
image = soup.find('img')
strong = soup.find('strong')
location = soup.find('a', 'href', 'em') #This is either done incorrectly or needs more added
description = soup.find('div', class_='description')

print(title.text)
print(image)
print(strong.text)
print(description.string)
print(location)

这显示了我试图抓取的 HTML 结构。包括 em 标签:' https://pastebin.com/zHy7H220 '

<div class="teaser"><figure data-mod="image" data-init="true"><div class="spacer" style="padding-top:66.50%;"></div>


<a href="http://www.manchestereveningnews.co.uk/news/greater-manchester-news/mum-who-witnessed-fianc-michael-13374115">
<img srcset="http://i1.manchestereveningnews.co.uk/incoming/article13366643.ece/ALTERNATES/s180/Mike-Grimshaw.jpg 180w, http://i1.manchestereveningnews.co.uk/incoming/article13366643.ece/ALTERNATES/s390/Mike-Grimshaw.jpg 390w, http://i1.manchestereveningnews.co.uk/incoming/article13366643.ece/ALTERNATES/s458/Mike-Grimshaw.jpg 458w" src="http://i1.manchestereveningnews.co.uk/incoming/article13366643.ece/ALTERNATES/s615/Mike-Grimshaw.jpg">
</a>
</figure>
<div class="inner">
<em><a href="http://www.manchestereveningnews.co.uk/all-about/sale">Sale</a></em> <------------------ text within the <em> tag is what i am trying to get.
<strong>
<a href="http://www.manchestereveningnews.co.uk/news/greater-manchester-news/mum-who-witnessed-fianc-michael-13374115">Mum who witnessed fiancé Michael Grimshaw being fatally stabbed 'cannot face returning home'</a></strong><div class="description">
<a href="http://www.manchestereveningnews.co.uk/news/greater-manchester-news/mum-who-witnessed-fianc-michael-13374115">A fundraising campaign has been set up to help Mr Grimshaw's family in the wake of his tragic death</a>
</div>
</div>
</div>

如您所见,它没有返回任何内容,这意味着我的代码不正确。但是,通过无数次寻找教程的尝试,我找不到解决此问题的方法。

任何帮助将不胜感激。

最佳答案

好的,所以 <em> tag 封装 anchor 标签。如果你想要 href该 anchor 内的链接,我相信您将需要:

location = soup.find('em').find('a')['href']

如果这是你想要的文字,那就用

location = soup.find('em').find('a').string # or .text

soup.find 需要一个标签,以及一个可选的 dict 参数来指定任何 css 选择器。您使用的语法不正确。

关于python - 从 <div> 标签内的 <em> 标签获取文本结果,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45269786/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com