gpt4 book ai didi

python - 使用Python访问JSON文件,得到 "Memory Error"

转载 作者:太空宇宙 更新时间:2023-11-03 17:06:38 26 4
gpt4 key购买 nike

我正在使用 JSON 数据集(reddit 数据),数据大小为 5GB。我的 JSON 数据 block 如下所示。

{"subreddit":"languagelearning","parent_id":"t1_cn9nn8v","retrieved_on":1425123427,"ups":1,"author_flair_css_class":"","gilded":0,"author_flair_text":"Lojban (N)","controversiality":0,"subreddit_id":"t5_2rjsc","edited":false,"score_hidden":false,"link_id":"t3_2qulql","name":"t1_cnau2yv","created_utc":"1420074627","downs":0,"body":"I played around with the Japanese Duolingo for awhile and basically if you're not near Fluency you won't learn much of anything.\n\nAs was said below, the only one that really exists is Chineseskill.","id":"cnau2yv","distinguished":null,"archived":false,"author":"Pennwisedom","score":1}

我正在使用 python 列出此数据中的每个“subreddit”。但我遇到内存错误。下面是我的 python 代码和错误。

import json
data=json.loads(open('/media/RC_2015-01').read())
for item in data:
name = item.get("subreddit")
print name

Traceback (most recent call last): File "name_python.py", line 4, in data=json.loads(open('/media/RC_2015-01').read()) MemoryError

据我所知,我正在尝试加载非常大的数据,这就是为什么我收到内存错误的原因。任何人都可以建议任何其他解决方法。

最佳答案

您需要使用迭代解析器,例如 ijson一次解析每个记录,而不是将整个文件加载到内存中。

关于您的错误消息,请确保您的数据是有效的 JSON,并且记录两边有方括号。该结构将正确解析

[
{...},
{...}
]

而以下结构将引发“附加数据”异常

{....}
{....}

关于python - 使用Python访问JSON文件,得到 "Memory Error",我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34520296/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com