gpt4 book ai didi

python - 4级嵌套字典转换为 Pandas 数据框python

转载 作者:太空宇宙 更新时间:2023-11-04 00:22:38 25 4
gpt4 key购买 nike

我有 4 层嵌套字典,想转换为 pandas 数据框或面板数据形式以提取 csv。我要每个细胞都有信息来自嵌套字典。

我有如下所示的嵌套字典,但在实际数据中有更多的键和值。

{2008: {'Barack Obama': {1: {'Author': 'Barack Obama',
'City': [],
'Title': 'Keynote Address at the 2004 Democratic National Convention',
'Type': 'address',
'Year': 2008},
2: {'Author': 'Barack Obama',
'City': ['Springfield'],
'Title': 'Remarks Announcing Candidacy for President in Springfield, Illinois',
'Type': 'remarks',
'Year': 2008},
3: {'Author': 'Barack Obama',
'City': ['Chicago'],
'Title': 'Remarks at the AIPAC Policy Forum in Chicago',
'Type': 'remarks',
'Year': 2008}},

'Bill Richardson': {1: {'Author': 'Bill Richardson',
'City': [],
'Title': 'Iraq Speech to New Hampshire Democratic State Party State Central Committee',
'Type': 'speech',
'Year': 2008},
2: {'Author': 'Bill Richardson',
'City': [],
'Title': 'Address to the DNC Winter Meeting',
'Type': 'address',
'Year': 2008},
3: {'Author': 'Bill Richardson',
'City': [],
'Title': 'Speech: The New Realism and the Rebirth of American Leadership',
'Type': 'speech',
'Year': 2008}}},


2012: {'Barack Obama': {1: {'Author': 'Barack Obama',
'City': ['Parma'],
'Title': '535 - Remarks at a Campaign Rally in Parma, Ohio',
'Type': 'remarks',
'Year': '2012'},
2: {'Author': 'Barack Obama',
'City': ['Sandusky'],
'Title': '534 - Remarks at a Campaign Rally in Sandusky, Ohio',
'Type': 'remarks',
'Year': '2012'},
3: {'Author': 'Barack Obama',
'City': [],
'Title': '533 - Remarks at a Campaign Rally in Maumee, Ohio',
'Type': 'remarks',
'Year': '2012'}}}

我想转换为喜欢这个数据框。

Year    Author1        No.   Author          City           Title   Type    Year
2008 Barack Oabama 1 Barack Oabama [] .... address 2008
2008 Barack Oabama 2 Barack Oabama ['Springfield'] .... remarks 2008
2008 Barack Oabama 3 Barack Oabama ['Chicago'] .... remarks 2008

.......................

2008 Bill Richardson 1 Bill Richardson [] .... remarks 2008
2008 Bill Richardson 2 Bill Richardson [] .... address 2008
2008 Bill Richardson 3 Bill Richardson [] .... speech 2008

.............

2012 Barack Oabama 1 Barack Oabama ['Parma'] .... remarks 2012
2012 Barack Oabama 2 Barack Oabama ['Sandusky'] .... remarks 2012
2012 Barack Oabama 3 Barack Oabama [] .... remarks 2012
.....................

我读过一些使用 for 循环制作数据框的答案,但它给出了第一列合并索引,但我确实想让每个单元格都包含字典中的信息。有什么建议么?谢谢!!

我已经尝试过这段代码,但这并没有给出我想要的,它在第一列中给出了合并索引单元格,但它不适用于 4 级嵌套字典。我又修改了一个for循环,但是最后的形式是三个元组,这不是我想要的。

pd.DataFrame.from_dict({(i,j): user_dict[i][j] 
for i in user_dict.keys()
for j in user_dict[i].keys()},
orient='index')

最佳答案

首先使用有用的名称构建字典有助于理解正在发生的事情。

temp = {}
for year1, values1 in data.items():
for author1, values2 in values1.items():
for number, values3 in values2.items():
temp.setdefault('Year1', []).append(year1)
temp.setdefault('Author1', []).append(author1)
temp.setdefault('No.', []).append(number)
for key, value in values3.items():
temp.setdefault(key, []).append(value)
print(pd.DataFrame(temp))

输出:

            Author          Author1           City  No.  \
0 Barack Obama Barack Obama [] 1
1 Barack Obama Barack Obama [Springfield] 2
2 Barack Obama Barack Obama [Chicago] 3
3 Bill Richardson Bill Richardson [] 1
4 Bill Richardson Bill Richardson [] 2
5 Bill Richardson Bill Richardson [] 3
6 Barack Obama Barack Obama [Parma] 1
7 Barack Obama Barack Obama [Sandusky] 2
8 Barack Obama Barack Obama [] 3



Title Type Year Year1
0 Keynote Address at the 2004 Democratic Nationa... address 2008 2008
1 Remarks Announcing Candidacy for President in ... remarks 2008 2008
2 Remarks at the AIPAC Policy Forum in Chicago remarks 2008 2008
3 Iraq Speech to New Hampshire Democratic State ... speech 2008 2008
4 Address to the DNC Winter Meeting address 2008 2008
5 Speech: The New Realism and the Rebirth of Ame... speech 2008 2008
6 535 - Remarks at a Campaign Rally in Parma, Ohio remarks 2012 2012
7 534 - Remarks at a Campaign Rally in Sandusky,... remarks 2012 2012
8 533 - Remarks at a Campaign Rally in Maumee, Ohio remarks 2012 2012

我们按照您想要的列顺序创建:

df = pd.DataFrame(temp, columns=['Year1', 'Author1',  'No.', 'Author',
'City', 'Title', 'Type', 'Year'])
df

enter image description here

关于python - 4级嵌套字典转换为 Pandas 数据框python,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48594148/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com