gpt4 book ai didi

python 将 beautifulsoup 输出转换为 dict/json

转载 作者:太空宇宙 更新时间:2023-11-04 04:34:20 35 4
gpt4 key购买 nike

我正在尝试从亚马逊的产品页面抓取数据。我用 beautifulsoup 得到了整个标记。我想以以下json格式获取必要的产品详细信息

{
asin: string,
title: string,
price: number,
listPrice: number,
prime: boolean,

dimensions: {
height: number,
length: number,
width: number,
weight: number,
},
images: Array<string>,
attributes: Array<{ name: string, value: string }>,
categories: <{ node: string, title: string }>,

}

据我了解,我需要先获取字典格式的详细信息。但不确定如何从巨大的 html 中获取这些特定标签以将它们转换为字典。

编辑:我的代码看起来像这样

import requests
from bs4 import BeautifulSoup

url = "http://www.amazon.com/dp/B00ILZH9BO"
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36'}
response = requests.get(url,headers=headers)
soup = BeautifulSoup(response.text,"lxml")
print(soup)

编辑 2:我提供了一些我需要产品详细信息的 html

#######title#########
<span class="a-size-large" id="productTitle">
MagicBrite Complete Teeth Whitening Kit At Home Whitening
</span>


#########price#####
<span class="a-color-price">
<span class="p13n-sc-price">$11.85</span>
</span>



############images#########

<li class="a-spacing-small item"><span class="a-list-item">
<span class="a-declarative" data-action="thumb-action" data-thumb-action='{"thumbnailIndex":4,"variant":"PT04","index":4,"type":"image"}'>
<span class="a-button a-button-thumbnail a-button-toggle"><span class="a-button-inner"><input class="a-button-input" type="submit"/><span aria-hidden="true" class="a-button-text">
<img alt="" src="https://images-na.ssl-images-amazon.com/images/I/51f8kCdwmqL._SS40_.jpg"/>
</span></span></span>
</span>
</span></li>
<li class="a-spacing-small item"><span class="a-list-item">
<span class="a-declarative" data-action="thumb-action" data-thumb-action='{"thumbnailIndex":5,"variant":"PT05","index":5,"type":"image"}'>
<span class="a-button a-button-thumbnail a-button-toggle"><span class="a-button-inner"><input class="a-button-input" type="submit"/><span aria-hidden="true" class="a-button-text">
<img alt="" src="https://images-na.ssl-images-amazon.com/images/I/517mTOTBEiL._SS40_.jpg"/>
</span></span></span>
</span>
</span></li>

最佳答案

手动。

data = {
'asin': soup.find(id="ASIN").attrs['value'],
'title': soup.find(id="title").text.strip(),
'price': soup.find(id="price").find(id="priceblock_ourprice").text.strip(),
....
}

price 似乎有点隐藏,在不同的页面上可能会有所不同,在哪里可以找到实际的“最终价格”。

最后,一旦您准备好 dict,只需将其提供给 json.dumps()

import json
result = json.dumps(data)

如果亚马逊决定改变他们的加价,事情可能会破裂。

关于python 将 beautifulsoup 输出转换为 dict/json,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52077888/

35 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com