gpt4 book ai didi

python - 使用 BS4 在 Python 中进行网页抓取 - 获取动态生成的列表

转载 作者:行者123 更新时间:2023-12-01 00:17:23 26 4
gpt4 key购买 nike

我需要抓取此列表中的“最佳编码训练营”列表: https://www.switchup.org/rankings/best-coding-bootcamps

我的作业说这应该可以通过 Beautiful Soup (而不是 Selenium)实现,但是当我尝试这样做时,生成的 HTML 不会返回训练营列表,而是返回看似空的类元素:

我的问题是,您认为仅使用 Beautiful Soup 而不求助于 Selenium 是否可以检索此内容?如果 Selenium 是必要的,那么简单的代码是什么?

到目前为止的代码非常简单:

from bs4 import BeautifulSoup

import requests

import time

url = "https://www.switchup.org/rankings/best-coding-bootcamps"

r = requests.get(url)


soup = BeautifulSoup(r.content,'lxml')
time.sleep(5)

print(soup)

提前非常感谢

最佳答案

您说得对,您发布的网址所在的页面是空的。数据通过 AJAX 从另一个 URL 加载。

如果您检查 Firefox/Chrome 中的“网络”选项卡,您可以找到以下 URL(数据为 JSON 格式):

import requests
from bs4 import BeautifulSoup

url = 'https://www.switchup.org/chimera/v1/bootcamp-list?mainTemplate=bootcamp-list%2Frankings&path=%2Frankings%2Fbest-coding-bootcamps&isDataTarget=false&featuredSchools=0&logoTag=logo&logoSize=original&numSchools=0&perPage=0&rankType=BootcampRankings&rankYear=2020&recentReview=true&reviewLength=50&numLocations=5&numSubjects=5&numCourses=5&sortOn=name&withReviews=false'

data = requests.get(url).json()

for i, bootcamp in enumerate(data['content']['bootcamps'], 1):
soup = BeautifulSoup(bootcamp['description'], 'html.parser')
print('{}. {}'.format(i, bootcamp['name']))
print(soup.get_text(strip=True))
print('-' * 80)

打印:

1. Le Wagon
Le Wagon is an intensive international coding bootcamp geared toward career changers and entrepreneurs who want to gain coding skills. Participants complete 450 hours of coding in 9 weeks full-time or 24 weeks part-time, which includes building their own web app. After completing the program, students join an international alumni network of 6,000+ for career support and community.
--------------------------------------------------------------------------------
2. App Academy
App Academy teaches participants everything they need to know about software engineering in just 12 weeks. Their full-time bootcamps have helped over 2,000 graduates find jobs at more than 850 companies. Their deferred tuition plan means participants pay for the program only after they’ve landed their first web development job.
--------------------------------------------------------------------------------
3. Ironhack
Ironhack offers two full-time bootcamps focused on web design, a 26-week program in web development and a nine-week program in user experience and user interface design. Students can access extensive career development services post-graduation including portfolio building and interview practice; scholarships are available for underrepresented populations and veterans.
--------------------------------------------------------------------------------

...and so on.

关于python - 使用 BS4 在 Python 中进行网页抓取 - 获取动态生成的列表,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59229182/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com