gpt4 book ai didi

python - 使用 Python 从文本文件中解析多个 json 对象

转载 作者:太空宇宙 更新时间:2023-11-04 08:32:30 24 4
gpt4 key购买 nike

我有一个 .json 文件,其中每一行都是一个对象。例如,前两行是:

{"review_id":"x7mDIiDB3jEiPGPHOmDzyw","user_id":"msQe1u7Z_XuqjGoqhB0J5g","business_id": ...}

{"review_id":"dDl8zu1vWPdKGihJrwQbpw","user_id":"msQe1u7Z_XuqjGoqhB0J5g","business_id": ...}

我尝试使用 ijson lib 进行处理,如下所示:

with open(filename, 'r') as f:
objects = ijson.items(f, 'columns.items')
columns = list(objects)

但是,我得到错误:

JSONError: Additional data

这似乎是由于多个对象导致我收到此类错误。

在 Jupyter 中分析此类 Json 文件的推荐方法是什么?

提前谢谢你

最佳答案

如果这是完整文件,则文件格式不正确。大括号之间必须有一个逗号,并且应该以方括号开头和结尾。像这样:[{...},{...}]。对于您的数据,它看起来像:

[{"review_id":"x7mDIiDB3jEiPGPHOmDzyw","user_id":"msQe1u7Z_XuqjGoqhB0J5g","business_id": ...},
{"review_id":"dDl8zu1vWPdKGihJrwQbpw","user_id":"msQe1u7Z_XuqjGoqhB0J5g","business_id": ...}]

下面是一些如何清理文件的代码:

lastline = None

with open("yourfile.json","r") as f:
lineList = f.readlines()
lastline=lineList[-1]

with open("yourfile.json","r") as f, open("cleanfile.json","w") as g:
for i,line in enumerate(f,0):
if i == 0:
line = "["+str(line)+","
g.write(line)
elif line == lastline:
g.write(line)
g.write("]")
else:
line = str(line)+","
g.write(line)

要正确读取 json 文件,您还可以考虑使用 pandas 库 ( https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_json.html)。

import pandas as pd

#get a pandas dataframe object from json file
df = pd.read_json("path/to/your/filename.json")

如果您不熟悉 pandas,这里是一个快速入门,如何使用数据框对象:

df.head() #gives you the first rows of the dataframe
df["review_id"] # gives you the column review_id as a vector
df.iloc[1,:] # gives you the complete row with index 1
df.iloc[1,2] # gives you the item in row with index 1 and column with index 2

关于python - 使用 Python 从文本文件中解析多个 json 对象,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51752925/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com