gpt4 book ai didi

python - 从带有 'Show More' 按钮的网站上抓取信息

转载 作者:行者123 更新时间:2023-12-01 00:33:56 28 4
gpt4 key购买 nike

我正在尝试从该网站抓取裙子信息:https://www.libertylondon.com/uk/department/women/clothing/dresses/

显然,我不仅对前 60 个结果感兴趣,而且对所有结果感兴趣。单击“显示更多”按钮几次后,我到达此网址:https://www.libertylondon.com/uk/department/women/clothing/dresses/#sz=60&start=300

我本以为使用以下代码可以完整下载上述页面,但由于某种原因,它仍然只能产生前 60 个结果。

import requests
import bs4

url = "https://www.libertylondon.com/uk/department/women/clothing/dresses/#sz=60&start=300"

res = requests.get(url)
res.encoding = 'utf-8'
res.raise_for_status()
html = res.text

soup = bs4.BeautifulSoup(html, "lxml")
elements = soup.find_all("div", attrs = {"class": "product product-tile"})

我可以看到问题在于请求本身,因为 soup 变量不包含我在检查页面时看到的完整 html 文本,但我无法弄清楚这是为什么。

最佳答案

尝试下面的网址,它会获取 331 个元素。

url : https://www.libertylondon.com/uk/department/women/clothing/dresses/?sz=331&start=0&format=ajax

import requests
import bs4

url="https://www.libertylondon.com/uk/department/women/clothing/dresses/?sz=331&start=0&format=ajax"
res = requests.get(url)
res.encoding = 'utf-8'
res.raise_for_status()
html = res.text

soup = bs4.BeautifulSoup(html, "lxml")
elements = soup.find_all("div", attrs = {"class": "product product-tile"})
print(len(elements))

关于python - 从带有 'Show More' 按钮的网站上抓取信息,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57953996/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com