gpt4 book ai didi

python - 带有亚马逊图书 ISBN 的间歇性 BeautifulSoup

转载 作者:行者123 更新时间:2023-12-04 00:54:35 24 4
gpt4 key购买 nike

我正在尝试收集有关亚马逊上某些书籍的一些信息,但我遇到了一个我无法理解的奇怪故障错误。起初我以为是亚马逊阻止了我的连接,但后来我注意到请求有一个“200 OK”并且它有相应页面的真实 HTML 内容。

让我们以这本书为例:https://www.amazon.co.uk/All-Rage-Cara-Hunter/dp/0241985110

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}

url = 'https://www.amazon.co.uk/All-Rage-Cara-Hunter/dp/0241985110/ref=sr_1_1?crid=2PPCQEJD706VY&dchild=1&keywords=books+bestsellers+2020+paperback&qid=1598132071&sprefix=book%2Caps%2C234&sr=8-1'

response = requests.get(url, headers=headers)

soup = BeautifulSoup(response.content, features="lxml")

price = {}

if soup.select("#buyBoxInner > ul > li > span > .a-text-strike") != []:
price["regular_price"] = float(
soup.select("#buyBoxInner > ul > li > span > .a-text-strike")[0].string[1:].replace(",", "."))
price["promo_price"] = float(soup.select(".offer-price")[0].string[1:].replace(",", "."))
else:
price["regular_price"] = float(soup.select(".offer-price")[0].string[1:].replace(",", "."))
price["currency"] = soup.select(".offer-price")[0].string[0]

这部分工作正常,我可以获得正常价格和促销价格(如果存在),甚至货币。但是当我这样做时:

isbn = soup.select("td.bucket > .content > ul > li")[4].contents[1].string.strip().replace("-", "")

我收到“IndexError:列表索引超出范围”。但是如果我调试代码,内容实际上就在那里!

这是 BeautifulSoup 的错误吗?请求响应是否太长?

最佳答案

亚马逊似乎返回了两个版本的页面。一个在哪里<td class="bucket">还有一个有几个<span>标签。此脚本尝试从它们中提取 ISBN:

import requests
from bs4 import BeautifulSoup


headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}

url = 'https://www.amazon.co.uk/All-Rage-Cara-Hunter/dp/0241985110'
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.content, features="lxml")

isbn_10 = soup.select_one('span.a-text-bold:contains("ISBN-10"), b:contains("ISBN-10")').find_parent().text
isbn_13 = soup.select_one('span.a-text-bold:contains("ISBN-13"), b:contains("ISBN-13")').find_parent().text

print(isbn_10.split(':')[-1].strip())
print(isbn_13.split(':')[-1].strip())

打印:

0241985110
978-0241985113

关于python - 带有亚马逊图书 ISBN 的间歇性 BeautifulSoup,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63541601/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com