gpt4 book ai didi

python - 无法解析网页中的不同产品链接

转载 作者:行者123 更新时间:2023-12-01 19:40:55 24 4
gpt4 key购买 nike

我用 Python 创建了一个脚本来从网页获取不同的产品链接。尽管我知道该网站的内容是动态的,但我尝试了传统的方式来让您知道我尝试过。我在开发工具中查找 API,但没有找到。有没有办法使用请求来获取这些链接?

Site Link

到目前为止我已经写过:

import requests
from bs4 import BeautifulSoup

link = "https://www.amazon.com/stores/node/10699640011"

def fetch_product_links(url):
res = requests.get(url,headers={"User-Agent":"Mozilla/5.0"})
soup = BeautifulSoup(res.text,"lxml")
for item_link in soup.select("[id^='ProductGrid-'] li[class^='style__itemOuter__'] > a"):
print(item_link.get("href"))

if __name__ == '__main__':
fetch_product_links(link)

如何使用请求从该网站获取不同的产品链接?

最佳答案

我认为您只需要可以从网络选项卡中看到的另一个网址构造收集的asins,即您可以显着缩短最终网址。但是,您确实需要向原始网址发出请求,以选择要在第二个网址中使用的标识符。返回 146 个链接。

import requests, re, json

node = '10699640011'

with requests.Session() as s:
r = s.get(f'https://www.amazon.com/stores/node/{node}')
p = re.compile(r'var slotsStr = "\[(.*?,){3} share\]";')
identifier = p.findall(r.text)[0]
identifier = identifier.strip()[:-1]
r = s.get(f'https://www.amazon.com/stores/slot/{identifier}?node={node}')
p = re.compile(r'var config = (.*?);')
data = json.loads(p.findall(r.text)[0])
asins = data['content']['ASINList']
links = [f'https://www.amazon.com/dp/{asin}' for asin in asins]
print(links)
<小时/>

编辑:

有两个给定节点:

import requests, re, json
from bs4 import BeautifulSoup as bs

nodes = ['3039806011','10699640011']

with requests.Session() as s:
for node in nodes:
r = s.get(f'https://www.amazon.com/stores/node/{node}')
soup = bs(r.content, 'lxml')
identifier = soup.select('.stores-widget-btf:not([id=share],[id*=RECOMMENDATION])')[-1]['id']
r = s.get(f'https://www.amazon.com/stores/slot/{identifier}?node={node}')
p = re.compile(r'var config = (.*?);')
data = json.loads(p.findall(r.text)[0])
asins = data['content']['ASINList']
links = [f'https://www.amazon.com/dp/{asin}' for asin in asins]
print(links)

关于python - 无法解析网页中的不同产品链接,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58716393/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com