gpt4 book ai didi

python - 是否使用正则表达式,从 HTML 中获取 json 值

转载 作者:行者123 更新时间:2023-12-01 00:39:33 31 4
gpt4 key购买 nike

所以我在这里得到了一些混合的答案。要么使用正则表达式运行,要么不运行。

我想做的是我试图在html中获取一个特定的值(spConfig的json),它是:

<script type="text/x-magento-init">
{
"#product_addtocart_form": {
"configurable": {
"spConfig": {"attributes":{"93":{"id":"93","code":"color","label":"Color","options":[{"id":"8243","label":"Helloworld","products":["97460","97459"]}],"position":"0"},"148":{"id":"148","code":"codish","label":"Codish","options":[{"id":"4707","label":"12.5","products":[]},{"id":"2724","label":"13","products":[]},{"id":"4708","label":"13.5","products":[]}],"position":"1"}},"template":"EUR <%- data.price %>","optionPrices":{"97459":{"oldPrice":{"amount":121},"basePrice":{"amount":121},"finalPrice":{"amount":121},"tierPrices":[]}},"prices":{"oldPrice":{"amount":"121"},"basePrice":{"amount":"121"},"finalPrice":{"amount":"121"}},"productId":"97468","chooseText":"Choose an Option...","images":[],"index":[]},
"gallerySwitchStrategy": "replace"
}
}
}
</script>

问题就在这里。抓取 HTML 时,有多个 <script type="text/x-magento-init">但只有一个spConfig我这里有两个问题。

  1. 我是否应该使用正则表达式获取 spConfig 值以便稍后使用 json.loads(spConfigValue) ?如果不是那么我应该使用什么方法来抓取 json 值?

  2. 如果我应该使用正则表达式。我一直在尝试使用 \"spConfig\"\: (.*?) 来抓取它但是它并没有为我抓取 json 值。我做错了什么?

最佳答案

在这种情况下,bs4 4.7.1 + :contains 是你的 friend 。您说只有一个匹配项,因此您可以执行以下操作:

from bs4 import BeautifulSoup as bs
import json

html= '''<html>
<head>
<script type="text/x-magento-init">
{
"#product_addtocart_form": {
"configurable": {
"spConfig": {"attributes":{"93":{"id":"93","code":"color","label":"Color","options":[{"id":"8243","label":"Helloworld","products":["97460","97459"]}],"position":"0"},"148":{"id":"148","code":"codish","label":"Codish","options":[{"id":"4707","label":"12.5","products":[]},{"id":"2724","label":"13","products":[]},{"id":"4708","label":"13.5","products":[]}],"position":"1"}},"template":"EUR <%- data.price %>","optionPrices":{"97459":{"oldPrice":{"amount":121},"basePrice":{"amount":121},"finalPrice":{"amount":121},"tierPrices":[]}},"prices":{"oldPrice":{"amount":"121"},"basePrice":{"amount":"121"},"finalPrice":{"amount":"121"}},"productId":"97468","chooseText":"Choose an Option...","images":[],"index":[]},
"gallerySwitchStrategy": "replace"
}
}
}
</script>
</head>
<body></body>
</html>'''
soup = bs(html, 'html.parser')
data = json.loads(soup.select_one('script:contains(spConfig)').text)

配置是:

data['#product_addtocart_form']['configurable']['spConfig']

带 key :

enter image description here

关于python - 是否使用正则表达式,从 HTML 中获取 json 值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57466140/

31 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com