gpt4 book ai didi

python - 字典键内数据帧的外部合并

转载 作者:行者123 更新时间:2023-12-01 08:01:56 24 4
gpt4 key购买 nike

我是 python 新手,一直在网上搜索此问题的解决方案,但没有找到任何解决方案。我有一个 pandas 数据帧字典,其中键是“年份”,值是当年的 pandas 数据帧。这是示例数据:

import pandas as pd
import numpy as np
from collections import defaultdict

##Creating Dataframes
data1_2018 =[[1,2018,80], [2,2018,70]]
data2_2018 = [[1,2018,77], [3,2018,62]]
data3_2018 = [[1,2018,82], [2,2018,88], [4,2018,66]]

data1_2017 = [[1,2017,80], [5,2017,70]]
data2_2017 = [[1,2017,77], [3,2017,62]]
data3_2017 = [[1,2017,50], [2,2017,52], [4,2017,51]]

df1_2018 = pd.DataFrame(data1_2018, columns = ['ID', 'Year', 'Score_1'])
df2_2018 = pd.DataFrame(data2_2018, columns = ['ID', 'Year', 'Score_2'])
df3_2018 = pd.DataFrame(data3_2018, columns = ['ID', 'Year', 'Score_3'])


df1_2017 = pd.DataFrame(data1_2017, columns = ['ID', 'Year', 'Score_1'])
df2_2017 = pd.DataFrame(data2_2017, columns = ['ID', 'Year', 'Score_2'])
df3_2017 = pd.DataFrame(data3_2017, columns = ['ID', 'Year', 'Score_3'])

###Creating list of all dataframes
all_df_list = [df1_2018,df2_2018,df3_2018,df1_2017,df2_2017,df3_2017]

我选择从包含所有数据帧的列表开始,因为这就是我真正问题中导入数据的方式。获得数据帧列表后,我创建了这些数据帧的字典。

yearly_dfs = defaultdict(list)
####Loop for creating dict with keys being years and values being dfs for that year
for df in all_df_list:
for yr, yr_df in df.groupby('Year'):
yearly_dfs[yr].append(yr_df)

现在,我的问题是..您能否循环遍历每个组的数据帧并将它们与按“ID”进行外部合并合并在一起。所需的输出将是每年仅包含一个数据帧的列表或字典。以下是每年的预期结果:

desired_output_2018 = df1_2018.merge(df2_2018, how = 'outer', on = ['ID', 'Year']).merge(df3_2018, how = 'outer', on = ['ID', 'Year']) 
desired_output_2017 = df1_2017.merge(df2_2017, how = 'outer', on = ['ID', 'Year']).merge(df3_2017, how = 'outer', on = ['ID', 'Year'])

print(desired_output_2018)
ID Year Score_1 Score_2 Score_3
0 1 2018 80.0 77.0 82.0
1 2 2018 70.0 NaN 88.0
2 3 2018 NaN 62.0 NaN
3 4 2018 NaN NaN 66.0

print(desired_output_2017)
ID Year Score_1 Score_2 Score_3
0 1 2017 80.0 77.0 50.0
1 5 2017 70.0 NaN NaN
2 3 2017 NaN 62.0 NaN
3 2 2017 NaN NaN 52.0
4 4 2017 NaN NaN 51.0

任何帮助将不胜感激!!

谢谢!

最佳答案

使用pandas.concatDataFrame.groupby ‘年份’和‘ID’,带有聚合函数 first ,然后在 dict comprehension 中使用与 grouby '年份':

df_all = (pd.concat(all_df_list, sort=False)
.groupby(['ID', 'Year']).first().reset_index())

df_years = {yr: df for yr, df in df_all.groupby('Year')}

访问方式如下:

df_years[2017]

ID Year Score_1 Score_2 Score_3
0 1 2017 80.0 77.0 50.0
2 2 2017 NaN NaN 52.0
4 3 2017 NaN 62.0 NaN
6 4 2017 NaN NaN 51.0
8 5 2017 70.0 NaN NaN

df_years[2018]

ID Year Score_1 Score_2 Score_3
1 1 2018 80.0 77.0 82.0
3 2 2018 70.0 NaN 88.0
5 3 2018 NaN 62.0 NaN
7 4 2018 NaN NaN 66.0

关于python - 字典键内数据帧的外部合并,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55695912/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com