gpt4 book ai didi

python - 如何使用 beautifulsoup python 提取特定 div 中的所有 href 和 src

转载 作者:行者123 更新时间:2023-11-28 17:33:15 25 4
gpt4 key购买 nike

我想提取页面上具有 class = 'news_item' 的所有 div 中的所有 href 和 src

html 看起来像这样:

<div class="col">
<div class="group">
<h4>News</h4>
<div class="news_item">

<a href="www.link.com">

<h2 class="link">
here is a link-heading
</h2>
<div class="Img">
<img border="0" src="/image/link" />
</div>
<p></p>
</a>
</div>

从这里我想提取的是:

www.link.com,这里是链接标题和/image/link

我的代码是:

 def scrape_a(url):

news_links = soup.select("div.news_item [href]")
for links in news_links:
if news_links:
return 'http://www.web.com' + news_links['href']

def scrape_headings(url):
for news_headings in soup.select("h2.link"):
return str(news_headings.string.strip())


def scrape_images(url):
images = soup.select("div.Img[src]")
for image in images:
if images:
return 'http://www.web.com' + news_links['src']


def top_stories():


r = requests.get(url)
soup = BeautifulSoup(r.content)
link = scrape_a(soup)
heading = scrape_headings(soup)
image = scrape_images(soup)
message = {'heading': heading, 'link': link, 'image': image}
print message

问题是它给我错误:

    **TypeError: 'NoneType' object is not callable**

这是回溯:

Traceback (most recent call last):
File "web_parser.py", line 40, in <module>
top_stories()
File "web_parser.py", line 32, in top_stories
link = scrape_a('www.link.com')
File "web_parser.py", line 10, in scrape_a
news_links = soup.select_all("div.news_item [href]")

最佳答案

您应该一次获取所有新闻项,然后遍历它们。这使得将您获得的数据组织成可管理的 block (在本例中为指令)变得容易。尝试这样的事情

url = "http://www.web.com"
r = requests.get(url)
soup = BeautifulSoup(r.text)

messages = []

news_links = soup.select("div.news_item") # selects all .news_item's
for l in news_links:
message = {}
message['heading'] = l.find("h2").text.strip()

link = l.find("a")
if not link:
continue
message['link'] = link['href']

image = l.find('img')
if not image:
continue
message['image'] = "http://www.web.com{}".format(image['src'])

messages.append(message)

print messages

关于python - 如何使用 beautifulsoup python 提取特定 div 中的所有 href 和 src,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/32821200/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com