gpt4 book ai didi

python - 如何使用不变的 URL 抓取多个页面 - Python & BeautifulSoup

转载 作者:太空宇宙 更新时间:2023-11-04 05:01:09 25 4
gpt4 key购买 nike

我正在尝试抓取此网站:https://www.99acres.com

到目前为止,我已经使用 BeautifulSoup 执行代码并从网站中提取数据;但是,我现在的代码只能让我进入第一页。我想知道是否有办法访问其他页面,因为当我点击下一页时,URL 不会改变,所以我不能每次都遍历不同的 URL。

下面是我目前的代码:

import io
import csv
import requests
from bs4 import BeautifulSoup

response = requests.get('https://www.99acres.com/search/property/buy/residential-all/hyderabad?search_type=QS&search_location=CP1&lstAcn=CP_R&lstAcnId=1&src=CLUSTER&preference=S&selected_tab=1&city=269&res_com=R&property_type=R&isvoicesearch=N&keyword_suggest=hyderabad%3B&bedroom_num=3&fullSelectedSuggestions=hyderabad&strEntityMap=W3sidHlwZSI6ImNpdHkifSx7IjEiOlsiaHlkZXJhYmFkIiwiQ0lUWV8yNjksIFBSRUZFUkVOQ0VfUywgUkVTQ09NX1IiXX1d&texttypedtillsuggestion=hy&refine_results=Y&Refine_Localities=Refine%20Localities&action=%2Fdo%2Fquicksearch%2Fsearch&suggestion=CITY_269%2C%20PREFERENCE_S%2C%20RESCOM_R&searchform=1&price_min=null&price_max=null')
html = response.text
soup = BeautifulSoup(html, 'html.parser')
list=[]

dealer = soup.findAll('div',{'class': 'srpWrap'})

for item in dealer:
try:
p = item.contents[1].find_all("div",{"class":"_srpttl srpttl fwn wdthFix480 lf"})[0].text
except:
p=''
try:
d = item.contents[1].find_all("div",{"class":"lf f13 hm10 mb5"})[0].text
except:
d=''

li=[p,d]
list.append(li)


with open('project.txt','w',encoding="utf-8") as file:
writer= csv.writer(file)
for row in list:
writer.writerows(row)

file.close()

最佳答案

试试这个。它将为您提供从第 1 页到第 3 页的不同属性名称。

import requests ; from bs4 import BeautifulSoup

base_url = "https://www.99acres.com/3-bhk-property-in-hyderabad-ffid-page-{0}"
for url in [base_url.format(i) for i in range(1,4)]:
response = requests.get(url)
soup = BeautifulSoup(response.text,"html.parser")
for title in soup.select("a[id^=desc_]"):
print(title.text.strip())

关于python - 如何使用不变的 URL 抓取多个页面 - Python & BeautifulSoup,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45686351/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com