gpt4 book ai didi

javascript - 如何使用 Python 拆分 Javascript 代码 (bs4)

转载 作者:行者123 更新时间:2023-12-02 23:11:45 25 4
gpt4 key购买 nike

因此,当我尝试从 bs4 代码中抓取 Javascript 值时,我遇到了一些问题。

基本上 JavaScript 看起来像

<script type="text/javascript">
var FancyboxI18nClose = 'Close';
var FancyboxI18nNext = 'Next';
var FancyboxI18nPrev = 'Previous';
var PS_CATALOG_MODE = false;
var ajaxsearch = true;
var attribute_anchor_separator = '-';
var blocksearch_type = 'top';
var combinationsFromController = {"163972":{"attributes_values":{"15":"40"},"attributes":[75],"price":0,"specific_price":false,"ecotax":0,"weight":0.6,"quantity":1,"reference":"IDP20059--IDPA163972","unit_impact":0,"minimal_quantity":"1","date_formatted":"","available_date":"","id_image":-1,"list":"'75'"}};
var comparator_max_item = 0;
</script>

我在这里尝试做的是抓取值 var CombinationsFromController = 但我尝试做的是:

bs4 = soup(requests.text, 'html.parser')

for nosto_sku_tag in bs4.find_all('script', {'type': 'text/javascript'}):
if 'combinationsFromController' in nosto_sku_tag.text.strip():
print(nosto_sku_tag)
for att, values in json.loads(
re.findall('var combinationsFromController = (\{.*}?);', nosto_sku_tag.text.strip())[0][:-1]).values():
print(values)

这给了我一个错误 Expecting ',' delimiter: line 1 column 4112 (char 4111)

我确实意识到,每当我尝试这样做

for nosto_sku_tag in bs4.find_all('script', {'type': 'text/javascript'}):
if 'combinationsFromController' in nosto_sku_tag.text.strip():
print(nosto_sku_tag)
print("---------")

输出给了我:

var FancyboxI18nClose = 'Close';
var FancyboxI18nNext = 'Next';
var FancyboxI18nPrev = 'Previous';
var PS_CATALOG_MODE = false;
var ajaxsearch = true;
var attribute_anchor_separator = '-';
var blocksearch_type = 'top';
var combinationsFromController = {"163972":{"attributes_values":{"15":"40"},"attributes":[75],"price":0,"specific_price":false,"ecotax":0,"weight":0.6,"quantity":1,"reference":"IDP20059--IDPA163972","unit_impact":0,"minimal_quantity":"1","date_formatted":"","available_date":"","id_image":-1,"list":"'75'"}};
var comparator_max_item = 0;
----------------------------

这似乎意味着 javascript 代码是一个代码,我认为可能需要拆分,但是我尝试使用正则表达式,但它对我没有帮助。

所以我的问题是如何抓取值var CombinationsFromController =

最佳答案

使用以下正则表达式模式来隔离分配给该变量的整个 javascript 对象。

combinationsFromController = (.*?);

尝试一下 here .

例如

import requests, re, json

r = requests.get(url)
p = re.compile(r'combinationsFromController = (.*?);', re.DOTALL)
data = json.loads(p.findall(r.text)[0])
<小时/>

enter image description here

关于javascript - 如何使用 Python 拆分 Javascript 代码 (bs4),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57324630/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com