gpt4 book ai didi

python - 在 python 中使用 pandas 从嵌套结构中构建数据框

转载 作者:行者123 更新时间:2023-11-30 22:11:45 24 4
gpt4 key购买 nike

我想用有点太复杂的数据集来实现机器学习。我想与 pandas 一起工作,然后在短剧学习中使用一些内置模型。

数据看起来在 JSON 文件中给出,示例如下所示:

{
"demo_Profile": {
"sex": "male",
"age": 98,
"height": 160,
"weight": 139,
"bmi": 5,
"someinfo1": [
"some_more_info1"
],
"someinfo2": [
"some_more_inf2"
],
"someinfo3": [
"some_more_info3"
],
},
"event": {
"info_personal": {
"info1": 219.59,
"info2": 129.18,
"info3": 41.15,
"info4": 94.19,
},
"symptoms": [
{
"name": "name1",
"socrates": {
"associations": [
"associations1"
],
"onsetType": "onsetType1",
"timeCourse": "timeCourse1"
}
},
{
"name": "name2",
"socrates": {
"timeCourse": "timeCourse2"
}
},
{
"name": "name3",
"socrates": {
"onsetType": "onsetType2"
}
},
{
"name": "name4",
"socrates": {
"onsetType": "onsetType3"
}
},
{
"name": "name5",
"socrates": {
"associations": [
"associations2"
]
}
}
],
"labs": [
{
"name": "name1 ",
"value": "valuelab"
}
]
}
}

我想创建一个考虑这种“嵌套数据”的 pandas 数据框架,但我不知道如何构建一个除了“单个参数”之外还考虑“嵌套参数”的数据框架

例如,我不知道如何将包含“单个参数”的“demo_Profile”与症状合并,症状是字典列表,在相同情况下为单个值,在其他情况下为列表。

有人知道解决这个问题的方法吗?

编辑*********

上面显示的 JSON 只是一个示例,在其他情况下,列表中的值的数量以及症状的数量都会不同。因此,上面显示的示例并不适用于每种情况。

最佳答案

考虑 pandas 的 json_normalize 。但是,由于嵌套更深,请考虑单独处理数据,然后在“标准化”列上通过前向填充连接在一起。

import json
import pandas as pd
from pandas.io.json import json_normalize

with open('myfile.json', 'r') as f:
data = json.loads(f.read())

final_df = pd.concat([json_normalize(data['demo_Profile']),
json_normalize(data['event']['symptoms']),
json_normalize(data['event']['info_personal']),
json_normalize(data['event']['labs'])], axis=1)

# FLATTEN NESTED LISTS
n_list = ['someinfo1', 'someinfo2', 'someinfo3', 'socrates.associations']

final_df[n_list] = final_df[n_list].apply(lambda col:
col.apply(lambda x: x if pd.isnull(x) else x[0]))

# FILLING FORWARD
norm_list = ['age', 'bmi', 'height', 'weight', 'sex', 'someinfo1', 'someinfo2', 'someinfo3',
'info1', 'info2', 'info3', 'info4', 'name', 'value']

final_df[norm_list] = final_df[norm_list].ffill()

输出

print(final_df)

# age bmi height sex someinfo1 someinfo2 someinfo3 weight name socrates.associations socrates.onsetType socrates.timeCourse info1 info2 info3 info4 name value
# 0 98.0 5.0 160.0 male some_more_info1 some_more_inf2 some_more_info3 139.0 name1 associations1 onsetType1 timeCourse1 219.59 129.18 41.15 94.19 name1 valuelab
# 1 98.0 5.0 160.0 male some_more_info1 some_more_inf2 some_more_info3 139.0 name2 NaN NaN timeCourse2 219.59 129.18 41.15 94.19 name1 valuelab
# 2 98.0 5.0 160.0 male some_more_info1 some_more_inf2 some_more_info3 139.0 name3 NaN onsetType2 NaN 219.59 129.18 41.15 94.19 name1 valuelab
# 3 98.0 5.0 160.0 male some_more_info1 some_more_inf2 some_more_info3 139.0 name4 NaN onsetType3 NaN 219.59 129.18 41.15 94.19 name1 valuelab
# 4 98.0 5.0 160.0 male some_more_info1 some_more_inf2 some_more_info3 139.0 name5 associations2 NaN NaN 219.59 129.18 41.15 94.19 name1 valuelab

关于python - 在 python 中使用 pandas 从嵌套结构中构建数据框,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51327847/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com