gpt4 book ai didi

python - 使用 Python 和 BeautifulSoup 生成雅虎新闻和 Bing 新闻的 URL

转载 作者:行者123 更新时间:2023-12-01 06:53:27 28 4
gpt4 key购买 nike

我想从 Yahoo News 和“Bing News”页面抓取数据。我想要抓取的数据是标题或/和标题下方的文本(无论可以抓取什么)以及发布的日期(时间)。

我写了一段代码,但它没有返回任何内容。这是我的 url 的问题,因为我收到 response 404

你能帮我解决一下吗?

这是“Bing”的代码

from bs4 import BeautifulSoup
import requests

term = 'usa'
url = 'http://www.bing.com/news/q?s={}'.format(term)

response = requests.get(url)
print(response)

soup = BeautifulSoup(response.text, 'html.parser')
print(soup)

这是针对雅虎的:

term = 'usa'

url = 'http://news.search.yahoo.com/q?s={}'.format(term)

response = requests.get(url)
print(response)

soup = BeautifulSoup(response.text, 'html.parser')
print(soup)

请帮我生成这些网址,它们背后的逻辑是什么,我还是个菜鸟:)

最佳答案

基本上你的网址是错误的。您必须使用的网址与使用常规浏览器时在地址栏中找到的网址相同。通常大多数搜索引擎和聚合器使用 q 参数作为搜索词。大多数其他参数通常不是必需的(有时它们是必需的 - 例如,用于指定结果页号等..)。

必应

from bs4 import BeautifulSoup
import requests
import re
term = 'usa'
url = 'https://www.bing.com/news/search?q={}'.format(term)
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
for news_card in soup.find_all('div', class_="news-card-body"):
title = news_card.find('a', class_="title").text
time = news_card.find(
'span',
attrs={'aria-label': re.compile(".*ago$")}
).text
print("{} ({})".format(title, time))

输出

Jason Mohammed blitzkrieg sinks USA (17h)
USA Swimming held not liable by California jury in sexual abuse case (1d)
United States 4-1 Canada: USA secure payback in Nations League (1d)
USA always plays the Dalai Lama card in dealing with China, says Chinese Professor (1d)
...

雅虎

from bs4 import BeautifulSoup
import requests
term = 'usa'
url = 'https://news.search.yahoo.com/search?q={}'.format(term)
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
for news_item in soup.find_all('div', class_='NewsArticle'):
title = news_item.find('h4').text
time = news_item.find('span', class_='fc-2nd').text
# Clean time text
time = time.replace('·', '').strip()
print("{} ({})".format(title, time))

输出

USA Baseball will return to Arizona for second Olympic qualifying chance (52 minutes ago)
Prized White Sox prospect Andrew Vaughn wraps up stint with USA Baseball (28 minutes ago)
Mexico defeats USA in extras for Olympic berth (13 hours ago)
...

关于python - 使用 Python 和 BeautifulSoup 生成雅虎新闻和 Bing 新闻的 URL,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58903413/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com