gpt4 book ai didi

python - 使用 BeautifulSoup 在 python 中抓取多个页面

转载 作者:行者123 更新时间:2023-11-27 22:57:21 24 4
gpt4 key购买 nike

我已经设法编写了从第一页抓取数据的代码,但现在我不得不在此代码中编写一个循环来抓取下“n”页。下面是代码

如果有人可以指导/帮助我编写从剩余页面中抓取数据的代码,我将不胜感激。

谢谢!

from bs4 import BeautifulSoup
import requests
import csv


url = requests.get('https://wsc.nmbe.ch/search?sFamily=Salticidae&fMt=begin&sGenus=&gMt=begin&sSpecies=&sMt=begin&multiPurpose=slsid&sMulti=&mMt=contain&searchSpec=s').text

soup = BeautifulSoup(url, 'lxml')

elements = soup.find_all('div', style="border-bottom: 1px solid #C0C0C0; padding: 10px 0;")
#print(elements)

csv_file = open('wsc_scrape.csv', 'w')

csv_writer = csv.writer(csv_file)

csv_writer.writerow(['sp_name', 'species_author', 'status', 'family'])


for element in elements:
sp_name = element.i.text.strip()
print(sp_name)



status = element.find('span', class_ = ['success label', 'error label']).text.strip()
print(status)




author_family = element.i.next_sibling.strip().split('|')
species_author = author_family[0].strip()
family = author_family[1].strip()
print(species_author)
print(family)


print()

csv_writer.writerow([sp_name, species_author, status, family])

csv_file.close()

最佳答案

您必须在 URL 中传递 page= 参数并遍历所有页面:

from bs4 import BeautifulSoup
import requests
import csv

csv_file = open('wsc_scrape.csv', 'w', encoding='utf-8')
csv_writer = csv.writer(csv_file)
csv_writer.writerow(['sp_name', 'species_author', 'status', 'family'])

for i in range(151):
url = requests.get('https://wsc.nmbe.ch/search?page={}&sFamily=Salticidae&fMt=begin&sGenus=&gMt=begin&sSpecies=&sMt=begin&multiPurpose=slsid&sMulti=&mMt=contain&searchSpec=s'.format(i+1)).text
soup = BeautifulSoup(url, 'lxml')
elements = soup.find_all('div', style="border-bottom: 1px solid #C0C0C0; padding: 10px 0;")
for element in elements:
sp_name = element.i.text.strip()
print(sp_name)
status = element.find('span', class_ = ['success label', 'error label']).text.strip()
print(status)
author_family = element.i.next_sibling.strip().split('|')
species_author = author_family[0].strip()
family = author_family[1].strip()
print(species_author)
print(family)
print()
csv_writer.writerow([sp_name, species_author, status, family])

csv_file.close()

关于python - 使用 BeautifulSoup 在 python 中抓取多个页面,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54861405/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com