gpt4 book ai didi

python - 使用 BeautifulSoup 在 Python 中循环遍历 Href

转载 作者:行者123 更新时间:2023-12-04 09:15:01 24 4
gpt4 key购买 nike

所以我试图从此链接中提取所有文章 URL。
但是我只得到 ['https://mn.usembassy.gov/mn/2020-naadam-mn/', 'https://mn.usembassy.gov/mn/06272020-presidential-proclamation-mn/', 'https://mn.usembassy.gov/mn/pr-060320-mn/', 'https://mn.usembassy.gov/mn/dv-2021-status-check-mn/', 'https://mn.usembassy.gov/mn/pr-050120-mn/']下面是我目前的代码。
该网站有 52 个页面,我正在尝试获取所有 URL,为什么它只给我几个 URL 而不是所有 URL?

import requests
from bs4 import BeautifulSoup
url = 'https://mn.usembassy.gov/mn/news-events-mn/'
reqs = requests.get(url)
soup = BeautifulSoup(reqs.text, 'lxml')

urls = []
for h in soup.find_all('h2'):
a = h.find('a')
urls.append(a.attrs['href'])
print(urls)

最佳答案

该页面仅包含 5 个文章链接,您需要转到下一页才能加载接下来的 5 个链接。此脚本将从页面中获取所有链接:

import requests
from bs4 import BeautifulSoup


url = 'https://mn.usembassy.gov/mn/news-events-mn/page/{page}/'

urls = []
for page in range(1, 53):
soup = BeautifulSoup(requests.get(url.format(page=page)).content, 'html.parser')
for h in soup.find_all('h2'):
a = h.find('a')
print(a['href'])
urls.append(a.attrs['href'])


from pprint import pprint
pprint(urls)
打印:
https://mn.usembassy.gov/mn/2020-naadam-mn/
https://mn.usembassy.gov/mn/06272020-presidential-proclamation-mn/
https://mn.usembassy.gov/mn/pr-060320-mn/
https://mn.usembassy.gov/mn/dv-2021-status-check-mn/
https://mn.usembassy.gov/mn/pr-050120-mn/
https://mn.usembassy.gov/mn/pr-042320-mca-website-mn/
https://mn.usembassy.gov/mn/2020-pr-us-mongolia-cpc-mn/
https://mn.usembassy.gov/mn/lead-2020-in-country-mn/
https://mn.usembassy.gov/mn/press-release-usaid-mar-24-2020-mn/
https://mn.usembassy.gov/mn/event-suspension-of-nonimmigrant-and-immigrant-visa-services-due-to-local-covid-19-related-preventative-measures-and-limited-staffing-mn/
https://mn.usembassy.gov/mn/2020-best-program-pr-mn/
https://mn.usembassy.gov/mn/2020-ncov-info-for-visa-mn/

...and so on.

关于python - 使用 BeautifulSoup 在 Python 中循环遍历 Href,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63269263/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com