gpt4 book ai didi

python - 将非结构化 json 解析为 csv

转载 作者:太空宇宙 更新时间:2023-11-04 02:39:27 25 4
gpt4 key购买 nike

我有 json 格式的不同应用程序的年度应用程序数据。每个应用程序有 10 个不同的 json 文件。我尝试将它们合并成一个 csv。先给大家看一下数据结构:

[{"date": "2017-10-23", "downloads": 15358985, "end": "2017-10-23", "data": {"2.7.3.4196-beta": 7, "1.0.1": 268, "1.0.2": 715, "2.9.0.4250-beta": 1, "2.7.3.4215-beta": 2, "2.7.2.4151-beta": 1, "2.2.3.1-signed": 9292}}, {"date": "2017-10-22", "downloads": 12778233, "end": "2017-10-22", "data": {"2.7.3.4196-beta": 5,  "2.4.1": 842, "2.99.0.1872beta": 12, "2.99.0.1857beta": 4, "2.3.1.1-signed": 3, "2.6.10": 11538,  "2.6.4.1-signed": 8, "2.7.3.4198-beta": 4}}]

当我将它们解析为 pandas 数据框时,我得到如下信息:

date         downloads  end         data

2017-10-23 15358985 2017-10-23 {"2.7.3.4196-beta": 7, "1.0.1": 268, "1.0.2": 715, "2.9.0.4250-beta": 1, "2.7.3.4215-beta": 2, "2.7.2.4151-beta": 1, "2.2.3.1-signed": 9292}}
2017-10-22 12778233 2017-10-22 {"2.7.3.4196-beta": 5, "2.4.1": 842, "2.99.0.1872beta": 12, "2.99.0.1857beta": 4, "2.3.1.1-signed": 3, "2.6.10": 11538, "2.6.4.1-signed": 8, "2.7.3.4198-beta": 4}}

请注意并非所有版本都每天下载。我如何为不同版本的应用程序创建一个列?如果应用程序在特定日期未下载,我们可以将其留空或填充 NaN

最佳答案

我认为您需要带有 reindexDataFrame 构造函数用于添加缺失的行:

j = [{"date": "2017-10-25", "downloads": 15358985, "end": "2017-10-23", "data": {"2.7.3.4196-beta": 7, "1.0.1": 268, "1.0.2": 715, "2.9.0.4250-beta": 1, "2.7.3.4215-beta": 2, "2.7.2.4151-beta": 1, "2.2.3.1-signed": 9292}}, {"date": "2017-10-22", "downloads": 12778233, "end": "2017-10-22", "data": {"2.7.3.4196-beta": 5,  "2.4.1": 842, "2.99.0.1872beta": 12, "2.99.0.1857beta": 4, "2.3.1.1-signed": 3, "2.6.10": 11538,  "2.6.4.1-signed": 8, "2.7.3.4198-beta": 4}}]

df = pd.DataFrame(j).set_index('date')
df.index = pd.to_datetime(df.index)

df = df.reindex(pd.date_range(start=df.index.min(), end=df.index.max()))
print (df)
data downloads \
2017-10-22 {'2.6.4.1-signed': 8, '2.99.0.1857beta': 4, '2... 12778233.0
2017-10-23 NaN NaN
2017-10-24 NaN NaN
2017-10-25 {'2.7.2.4151-beta': 1, '1.0.1': 268, '2.9.0.42... 15358985.0

end
2017-10-22 2017-10-22
2017-10-23 NaN
2017-10-24 NaN
2017-10-25 2017-10-23

json_normalize 的解决方案,但是如果不同格式的 json 得到很多 NaN 的值:

df = json_normalize(j).set_index('date')
df.index = pd.to_datetime(df.index)
#
df = df.reindex(pd.date_range(start=df.index.min(), end=df.index.max()))
print (df)
data.1.0.1 data.1.0.2 data.2.2.3.1-signed data.2.3.1.1-signed \
2017-10-22 NaN NaN NaN 3.0
2017-10-23 NaN NaN NaN NaN
2017-10-24 NaN NaN NaN NaN
2017-10-25 268.0 715.0 9292.0 NaN

data.2.4.1 data.2.6.10 data.2.6.4.1-signed \
2017-10-22 842.0 11538.0 8.0
2017-10-23 NaN NaN NaN
2017-10-24 NaN NaN NaN
2017-10-25 NaN NaN NaN

data.2.7.2.4151-beta data.2.7.3.4196-beta data.2.7.3.4198-beta \
2017-10-22 NaN 5.0 4.0
2017-10-23 NaN NaN NaN
2017-10-24 NaN NaN NaN
2017-10-25 1.0 7.0 NaN

data.2.7.3.4215-beta data.2.9.0.4250-beta data.2.99.0.1857beta \
2017-10-22 NaN NaN 4.0
2017-10-23 NaN NaN NaN
2017-10-24 NaN NaN NaN
2017-10-25 2.0 1.0 NaN

data.2.99.0.1872beta downloads end
2017-10-22 12.0 12778233.0 2017-10-22
2017-10-23 NaN NaN NaN
2017-10-24 NaN NaN NaN
2017-10-25 NaN 15358985.0 2017-10-23

关于python - 将非结构化 json 解析为 csv,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46951917/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com