gpt4 book ai didi

python - 如何打印并显示网页抓取的所有结果?

转载 作者:行者123 更新时间:2023-12-01 06:23:53 24 4
gpt4 key购买 nike

import requests
from bs4 import BeautifulSoup

URL = ""
page = requests.get(URL)

soup = BeautifulSoup(page.content, 'html.parser')
results = soup.find(id='simple-view')

events_elems = results.find_all('ul', class_='searchResults')

for event_elem in events_elems:

date_elem = event_elem.find('li', class_='date-indicator')
location_elem = event_elem.find('div', class_='text--labelSecondary')
e_elem = event_elem.find('a', class_='event')
if None in (date_elem,location_elem, e_elem):
continue
print(date_elem.text)
print(location_elem.text)
print(e_elem.text)

我刚开始玩python网页抓取功能,尝试用上面的代码在meetup.com上抓取,但只显示一组结果,是我在迭代部分做错了什么吗?

最佳答案

您使用的.find_all

events_elems = results.find_all('ul', class_='searchResults')

没有捕获网站中的每一行,即,您需要收紧搜索条件。

您使用的 event_elem.find('li', class_='date-indicator') 也不够,因为它没有捕获每个单独事件的日期。

<小时/>

请参阅以下工作代码,它通过事件列表的容器捕获结果集:

import requests
from bs4 import BeautifulSoup

URL = "https://www.meetup.com/find/events/"
page = requests.get(URL)

soup = BeautifulSoup(page.content, 'html.parser')
results = soup.find(id='simple-view')

event_container = results.find_all('ul', class_='event-listing-container')[0]
events_elems = event_container.find_all(class_= 'event-listing')

for event_elem in events_elems:

location_elem = event_elem.find('div', class_='text--labelSecondary')
e_elem = event_elem.find('a', class_='event')
date = "{}-{}-{} {}".format(
event_elem.attrs['data-year'],
event_elem.attrs['data-month'],
event_elem.attrs['data-day'],
event_elem.find('time').text.replace('\n', ''),
)

print(date)
print(location_elem.text)
print(e_elem.text)
print('-----')

示例输出为

2020-2-17 9:00AM


Architecting for Innovation



Australasian Enterprise Architecture Summer School 2020

-----
2020-2-17 5:00PM


Sydney Indoor Rock Climbers



Monday and Thursday Night Climbing @ St Peters (Beginners Welcome)

-----
2020-2-17 5:30PM

......
......

关于python - 如何打印并显示网页抓取的所有结果?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60255300/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com