gpt4 book ai didi

python - 网页抓取 - Selenium BeautifulSoup - 循环分页

转载 作者:行者123 更新时间:2023-12-01 07:12:29 26 4
gpt4 key购买 nike

我想稍微搞一下 Selenium (只是学习片段 - 问了一些关于 beautifulsoup 的问题,并得到了一些很好的建议。

无论如何,我只是简单地尝试循环浏览页面并抓取 div.details 并打印它找到的数量(作为初始测试)。问题是它似乎只是坐在第一页上并重新加载它并卡在循环中。

我该如何更改它,以便它循环显示第 1 页、第 2 页,然后结束?

from bs4 import BeautifulSoup
import requests
import csv
import pandas
from pandas import DataFrame
import re
import os
import locale
os.environ["PYTHONIOENCODING"] = "utf-8"


from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager

page = 1

driver = webdriver.Chrome(ChromeDriverManager().install())
url="https://www.gunstar.co.uk/view-trader/global-rifle-snipersystems/58782?page={page}"





#grab all links which contain the href specifed

with requests.Session() as session:
while True:
res=session.get(url.format(page=page))
soup=BeautifulSoup(res.content,'html.parser')
gun_details = soup.select('div.details')
if soup.select("nav_next") is None:
break
page += 1
driver.get(url) #navigate to the page
print(len(gun_details))

最佳答案

您不需要 Selenium 来导航,您可以使用请求方法来完成。

from bs4 import BeautifulSoup
import requests
import csv
import pandas
from pandas import DataFrame
import re
import os
import locale
os.environ["PYTHONIOENCODING"] = "utf-8"

page = 1
url="https://www.gunstar.co.uk/view-trader/global-rifle-snipersystems/58782?page={}"

with requests.Session() as session:
while True:
print(url.format(page))
res=session.get(url.format(page))
soup=BeautifulSoup(res.content,'html.parser')
gun_details = soup.select('div.details')
print(len(gun_details))
if len(soup.select(".nav_next"))==0:
break
page += 1

我已经提供了打印和控制台它所显示的功能。

https://www.gunstar.co.uk/view-trader/global-rifle-snipersystems/58782?page=1
10
https://www.gunstar.co.uk/view-trader/global-rifle-snipersystems/58782?page=2
4

关于python - 网页抓取 - Selenium BeautifulSoup - 循环分页,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58136244/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com