gpt4 book ai didi

Python Pandas - 处理具有嵌套字典(json)值的列

转载 作者:行者123 更新时间:2023-12-01 07:20:06 24 4
gpt4 key购买 nike

我有一列“data”,其中包含 json 对象作为值。我想把它们分开。

source = {'_id':['SE-DATA-BB3A','SE-DATA-BB3E','SE-DATA-BB3F'],  'pi':['BB3A_CAP_BMLS','BB3E_CAP_BMLS','BB3F_CAP_PAS'], 'Datetime':['190725-122500', '190725-122500', '190725-122500'], 'data': [ {'bb3a_bmls':[{'name': 'WAG 01', 'id': '105F', 'state': 'available', 'nodes': 3,'volumes-': [{'state': 'available', 'id': '330172', 'name': 'q_-4144d4e'}, {'state': 'available', 'id': '275192', 'name': 'p_3089d821ae', }]}]}
, {'bb3b_bmls':[{'name': 'FEC 01', 'id': '382E', 'state': 'available', 'nodes': 4,'volumes': [{'state': 'unavailable', 'id': '830172', 'name': 'w_-4144d4e'}, {'state': 'unavailable', 'id': '223192', 'name': 'g_3089d821ae', }]}]}
, {'bb3c_bmls':[{'name': 'ASD 01', 'id': '303F', 'state': 'available', 'nodes': 6,'volumes': [{'state': 'unavailable', 'id': '930172', 'name': 'e_-4144d4e'}, {'state': 'unavailable', 'id': '245192', 'name': 'h_3089d821ae', }]}]}
] }

input_df = pd.DataFrame(source)

My input_df is as below:

enter image description here

I'm expecting the output_df as below:

enter image description here

我可以设法获取列volume_id volume_name volume_statename id state nodes 使用以下方法。

input_df['data'] = input_df['data'].apply(pd.Series) 

结果如下 enter image description here

Test_df=pd.concat([json_normalize(input_df['bb3a_bmls'][key], 'volumes', ['name','id','state','nodes'], record_prefix='volume_') for key in input_df.index if isinstance(input_df['bb3a_bmls'][key],list)]).reset_index(drop=True)

这将导致一个“服务器” - bb3a_bmls

enter image description here

现在,我不知道如何取回父列“_id”、“pi”、“Datetime”。

最佳答案

想法是按每个嵌套列表或按字典循环,并创建字典列表以传递给 DataFrame 构造函数:

out = []
zipped = zip(source['_id'], source['pi'], source['Datetime'], source['data'])

for a,b,c,d in zipped:
for k1, v1 in d.items():
for e in v1:
#get all values of dict with exlude volumes
di = {k2:v2 for k2, v2 in e.items() if k2 != 'volumes'}
#for each dict in volumes add volume_ to keys
for f in e['volumes']:
di1 = {f'volume_{k3}':v3 for k3, v3 in f.items()}
#create dict from previous values
di2 = {'_id':a, 'pi':b,'Datetime':c, 'SERVER':k1}
#add to list merged dictionaries
out.append({**di2, ** di1, **di})

df = pd.DataFrame(out)
print (df)

_id pi Datetime SERVER volume_state \
0 SE-DATA-BB3A BB3A_CAP_BMLS 190725-122500 bb3a_bmls available
1 SE-DATA-BB3A BB3A_CAP_BMLS 190725-122500 bb3a_bmls available
2 SE-DATA-BB3E BB3E_CAP_BMLS 190725-122500 bb3b_bmls unavailable
3 SE-DATA-BB3E BB3E_CAP_BMLS 190725-122500 bb3b_bmls unavailable
4 SE-DATA-BB3F BB3F_CAP_PAS 190725-122500 bb3c_bmls unavailable
5 SE-DATA-BB3F BB3F_CAP_PAS 190725-122500 bb3c_bmls unavailable

volume_id volume_name name id state nodes
0 330172 q_-4144d4e WAG 01 105F available 3
1 275192 p_3089d821ae WAG 01 105F available 3
2 830172 w_-4144d4e FEC 01 382E available 4
3 223192 g_3089d821ae FEC 01 382E available 4
4 930172 e_-4144d4e ASD 01 303F available 6
5 245192 h_3089d821ae ASD 01 303F available 6

关于Python Pandas - 处理具有嵌套字典(json)值的列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57740776/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com