gpt4 book ai didi

Python BeautifulSoup 没有抓取多个页面

转载 作者:太空宇宙 更新时间:2023-11-03 11:14:04 24 4
gpt4 key购买 nike

我正在尝试从每页 15 个广告格式的网页中抓取数据,然后点击下一页并获取接下来的 15 个广告。

出于某种原因,该脚本只从一个页面抓取而不会转到另一个页面。

这是我的脚本代码:

page_num = 10
curr_page = 1
i = 1
car_title, price_hrk, year_made, km_made, date_pub, temp = [], [], [], [], [], []
title = soup.find_all(class_="classified-title")
price_kn = soup.find_all(class_="price-kn")
info = soup.find_all(class_="info-wrapper")
date = soup.find_all("span", class_="date")


# while the current page is less then or equal to the page_num variable
while curr_page <= page_num:
# make a request with a current page
page = requests.get("https://www.oglasnik.hr/prodaja-automobila?page={}".format(curr_page))
# pass it to beautiful soup
soup = BeautifulSoup(page.content, "html.parser")

# while i is less then 15 elements on the single site
while i <= 15:
# check for existance
if title[i]:
# append the value
car_title.append(title[i].get_text())
else:
# append NaN
car_title.append(np.nan)

if price_kn[i]:
price_hrk.append(price_kn[i].get_text())
else:
price_hrk.append(np.nan)

if date[i]:
date_pub.append(date[i].get_text())
else:
date_pub.append(np.nan)

# dual values, so append both to a temporary list
for tag in info[i].find_all("span", class_="classified-param-value"):
for val in tag:
temp.append(val)

try:
# if length of element is less then 5
if len(temp[0]) < 5:
# it's a year, append to year_made list
year_made.append(temp[0])
km_made.append(temp[2])
except IndexError:
# if index out of bound append NaN
year_made.append(np.nan)
km_made.append(np.nan)

# reset temp
temp = []
# add 1 to i element
i += 1

# add 1 to current page
curr_page += 1

现在,如果我打印出其中一个列表的长度,我会得到 15。

有人可以告诉我我做错了什么或指出正确的方向吗?

谢谢。

最佳答案

您还需要重置您的i。就在之前(或之后)

   curr_page += 1

添加:

   i = 1

它应该可以工作。

关于Python BeautifulSoup 没有抓取多个页面,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55011123/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com