gpt4 book ai didi

python - 将 json 记录数组规范化为数据框

转载 作者:行者123 更新时间:2023-12-02 18:28:21 25 4
gpt4 key购买 nike

我想从位于 here 的 owid covid19 json 数据创建一个数据框.json 在数据列中有一组每日记录,这与国家索引一起是我试图制作成数据框的内容。

{"AFG":{"continent":"Asia","location":"Afghanistan","population":39835428.0,"population_density":54.422,"median_age":18.6,"aged_65_older":2.581,"aged_70_older":1.337,"gdp_per_capita":1803.987,"cardiovasc_death_rate":597.029,"diabetes_prevalence":9.59,"handwashing_facilities":37.746,"hospital_beds_per_thousand":0.5,"life_expectancy":64.83,"human_development_index":0.511,"data":[{"date":"2020-02-24","total_cases":5.0,"new_cases":5.0,"total_cases_per_million":0.126,"new_cases_per_million":0.126,"stringency_index":8.33},{"date":"2020-02-25","total_cases":5.0,"new_cases":0.0,"total_cases_per_million":0.126,"new_cases_per_million":0.0,"stringency_index":8.33},

到目前为止,我一直将文件直接加载到数据框中

df = pd.read_json('owid-covid-data.json', orient='index')

然后将数组归一化

data = pd.concat([pd.DataFrame(json_normalize(key)) for key in df['data']])

除了删除索引并因此不提供标识符以连接回静态值之外,它工作正常。

我还想象有一种比我用过的更有效的规范化方法。

非常感谢任何帮助!

最佳答案

这不是最有效的方法,但它有效:

new_df = pd.DataFrame()
for index, row in df.iterrows():
tmp = pd.json_normalize(row['data'])
tmp['country_code'] = index
new_df = pd.concat([new_df, tmp])

编辑:

我找到了一种更有效的方法,即一次规范化所有 JSON:

country_codes = []
datas = []
for index, data in zip(df.index, df['data']):
datas.extend(data)
country_codes.extend(len(data)*[index])

new_df = pd.DataFrame(datas)
new_df['country_code'] = country_codes

9.38 s ± 856 ms per loop 改进到 1.37 s ± 12 ms per loop

关于python - 将 json 记录数组规范化为数据框,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/69791253/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com