gpt4 book ai didi

python - JSON 文件不读取 pandas

转载 作者:行者123 更新时间:2023-11-30 22:05:25 24 4
gpt4 key购买 nike

我有一个具有音乐声学特征的 JSON 文件(大约 1GB)。我正在尝试使用将其读入我的 pandas 笔记本dataf =“/home/work/my.json”
d = json.load(open(dataf, 'r'))
它总是给我一个错误提示

Extra data: line 2 column 1 (char 499)

我知道第 499 个字符是下一首轨道的开始,但我在网上查看过,不确定如何读取它。以下是数据示例。

{"_id":{"$oid":"5b2cff21aecd2a723459cd65"},"id":1,"sp_id":"0XLOf9LhyazPX9Ld8jPiUq","danceability":0.7079999999999999627,"energy":0.60999999999999998668,"key":"2","loudness":-4.5220000000000002416,"mode":"1","speechiness":0.057399999999999999634,"acousticness":0.020400000000000001465,"instrumentalness":4.4499999999999997457e-06,"liveness":0.064100000000000004197,"valence":0.30499999999999999334,"tempo":123.0379999999999967,"time_signature":"4","track_uri":"spotify:track:0XLOf9LhyazPX9Ld8jPiUq"} {"_id":{"$oid":"5b2cff21aecd2a723459cd66"},"id":2,"sp_id":"7aF09WaavZAmAWuUeYxlYD","danceability":0.59299999999999997158,"energy":0.86799999999999999378,"key":"1","loudness":-3.5729999999999999538,"mode":"0","speechiness":0.29499999999999998446,"acousticness":0.182999999999999996,"instrumentalness":0.0,"liveness":0.36499999999999999112,"valence":0.49599999999999999645,"tempo":104.98799999999999955,"time_signature":"4","track_uri":"spotify:track:7aF09WaavZAmAWuUeYxlYD"} {"_id":{"$oid":"5b2cff21aecd2a723459cd67"},"id":3,"sp_id":"0tKcYR2II1VCQWT79i5NrW","danceability":0.5999999999999999778,"energy":0.81000000000000005329,"key":"0","loudness":-4.748999999999999666,"mode":"1","speechiness":0.047899999999999998135,"acousticness":0.0068300000000000001335,"instrumentalness":0.20999999999999999223,"liveness":0.15499999999999999889,"valence":0.29799999999999998712,"tempo":167.87999999999999545,"time_signature":"4","track_uri":"spotify:track:0tKcYR2II1VCQWT79i5NrW"} {"_id":{"$oid":"5b2cff21aecd2a723459cd68"},"id":4,"sp_id":"6TWSVHx6z6E42JiwloGv1k","danceability":0.50300000000000000266,"energy":0.91800000000000003819,"key":"11","loudness":-5.0099999999999997868,"mode":"1","speechiness":0.046399999999999996803,"acousticness":0.016199999999999999123,"instrumentalness":0.024400000000000001549,"liveness":0.18599999999999999867,"valence":0.41799999999999998268,"tempo":140.0,"time_signature":"4","track_uri":"spotify:track:6TWSVHx6z6E42JiwloGv1k"} {"_id":{"$oid":"5b2cff21aecd2a723459cd69"},"id":5,"sp_id":"5QqyRUZeBE04yJxsD1OC0I","danceability":0.76000000000000000888,"energy":0.56100000000000005418,"key":"1","loudness":-8.6969999999999991758,"mode":"1","speechiness":0.13400000000000000799,"acousticness":0.018499999999999999084,"instrumentalness":1.9400000000000000604e-05,"liveness":0.19900000000000001021,"valence":0.12099999999999999645,"tempo":134.98300000000000409,"time_signature":"4","track_uri":"spotify:track:5QqyRUZeBE04yJxsD1OC0I"}

最佳答案

您的 JSON 无法解析,因为它是无效的 JSON。解析器提示的字符就在第一个换行符之后。显然,有一些对象逐行转储到文件中,它们一起并不构成有效的对象。请参阅:

>>> json.loads(s[:499])
{'_id': {'$oid': '5b2cff21aecd2a723459cd65'},
'id': 1,
'sp_id': '0XLOf9LhyazPX9Ld8jPiUq',
'danceability': 0.708,
'energy': 0.61,
'key': '2',
'loudness': -4.522,
'mode': '1',
'speechiness': 0.0574,
'acousticness': 0.0204,
'instrumentalness': 4.45e-06,
'liveness': 0.0641,
'valence': 0.305,
'tempo': 123.038,
'time_signature': '4',
'track_uri': 'spotify:track:0XLOf9LhyazPX9Ld8jPiUq'}
>>> json.loads(s[499:973])
{'_id': {'$oid': '5b2cff21aecd2a723459cd66'},
'id': 2,
'sp_id': '7aF09WaavZAmAWuUeYxlYD',
'danceability': 0.593,
'energy': 0.868,
'key': '1',
'loudness': -3.573,
'mode': '0',
'speechiness': 0.295,
'acousticness': 0.183,
'instrumentalness': 0.0,
'liveness': 0.365,
'valence': 0.496,
'tempo': 104.988,
'time_signature': '4',
'track_uri': 'spotify:track:7aF09WaavZAmAWuUeYxlYD'}

(s 是加载到字符串中的示例输入。)这些对象被一个接一个地打印到文件中。您要么必须更改语法,使其成为对象列表(添加方括号和逗号),要么逐行解析文件,在输入的每一行上调用 json.loads

现在,不要引用我的话,但是破解您的输入以使其成为有效的 JSON 非常容易:

>>> len(json.loads('[' + s.replace('\n', ',') + ']'))
5

如果文件很大,您可能不想一次性执行上述黑客操作和随后的解析,因为会产生巨大的内存开销。在这种情况下,我建议逐个对象地解析文件。假设您的文件每一行包含一个对象,您只需要

dat = [json.loads(line) for line in open(infile)]

其中 infile 是串联 JSON 文件的路径。对于一个大文件来说,需要很长时间,结果会占用大量内存,但我希望这样用于解析的额外开销会更少。

关于python - JSON 文件不读取 pandas,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53017795/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com