gpt4 book ai didi

javascript - 将 Javascript 变量抓取到 Python 中

转载 作者:太空宇宙 更新时间:2023-11-03 20:41:38 25 4
gpt4 key购买 nike

我想从 http://maps.latimes.com/neighborhoods/population/density/neighborhood/list/ 中抓取以下数据:

  var hoodFeatures = {
type: "FeatureCollection",
features: [{
type: "Feature",
properties: {
name: "Koreatown",
slug: "koreatown",
url: "/neighborhoods/neighborhood/koreatown/",
has_statistics: true,
label: 'Rank: 1<br>Population per Sqmi: 42,611',
population: "115,070",
stratum: "high"
},
geometry: { "type": "MultiPolygon", "coordinates": [ [ [ [ -118.286908, 34.076510 ], [ -118.289208, 34.052511 ], [ -118.315909, 34.052611 ], [ -118.323009, 34.054810 ], [ -118.319309, 34.061910 ], [ -118.314093, 34.062362 ], [ -118.313709, 34.076310 ], [ -118.286908, 34.076510 ] ] ] ] }
},

从上面的 html 中,我想获取以下各项:

name
population per sqmi
population
geometry

并按名称将其转换为数据框

到目前为止我已经尝试过

import requests
import json
from bs4 import BeautifulSoup

response_obj = requests.get('http://maps.latimes.com/neighborhoods/population/density/neighborhood/list/').text
soup = BeautifulSoup(response_obj,'lxml')

该对象具有脚本信息,但我不明白如何按照此线程中的建议使用 json 模块: Parsing variable data out of a javascript tag using python

json_text = '{%s}' % (soup.partition('{')[2].rpartition('}')[0],)
value = json.loads(json_text)
value

我收到此错误

TypeError                                 Traceback (most recent call last)
<ipython-input-12-37c4c0188ed0> in <module>
1 #Splits the text on the first bracket and last bracket of the javascript into JSON format
----> 2 json_text = '{%s}' % (soup.partition('{')[2].rpartition('}')[0],)
3 value = json.loads(json_text)
4 value
5 #import pprint

TypeError: 'NoneType' object is not callable

有什么建议吗?谢谢

最佳答案

我不太确定如何用 BeautifulSoup 做到这一点,另一种选择可能是设计一个表达式并提取我们想要的值:

(?:name|population per sqmi|population)\s*:\s*"?(.*?)\s*["']|(?:geometry)\s*:\s*({.*})

Demo

测试

import re

regex = r"(?:name|population per sqmi|population)\s*:\s*\"?(.*?)\s*[\"']|(?:geometry)\s*:\s*({.*})"

test_str = ("var hoodFeatures = {\n"
" type: \"FeatureCollection\",\n"
" features: [{\n"
" type: \"Feature\",\n"
" properties: {\n"
" name: \"Koreatown\",\n"
" slug: \"koreatown\",\n"
" url: \"/neighborhoods/neighborhood/koreatown/\",\n"
" has_statistics: true,\n"
" label: 'Rank: 1<br>Population per Sqmi: 42,611',\n"
" population: \"115,070\",\n"
" stratum: \"high\"\n"
" },\n"
" geometry: { \"type\": \"MultiPolygon\", \"coordinates\": [ [ [ [ -118.286908, 34.076510 ], [ -118.289208, 34.052511 ], [ -118.315909, 34.052611 ], [ -118.323009, 34.054810 ], [ -118.319309, 34.061910 ], [ -118.314093, 34.062362 ], [ -118.313709, 34.076310 ], [ -118.286908, 34.076510 ] ] ] ] }\n"
" },\n")

matches = re.finditer(regex, test_str, re.MULTILINE | re.IGNORECASE)

for matchNum, match in enumerate(matches, start=1):

print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))

for groupNum in range(0, len(match.groups())):
groupNum = groupNum + 1

print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))

关于javascript - 将 Javascript 变量抓取到 Python 中,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56829816/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com