gpt4 book ai didi

python - 为每一行向 DF 添加唯一组,包括来自其他列的总和

转载 作者:行者123 更新时间:2023-12-04 15:23:48 27 4
gpt4 key购买 nike

我有一个看起来像这样的 DatFrame:

ID     field_1     area_1    field_2       area_2    field_3     area_3    field_4      area_4
1 scoccer 500 basketball 200 swimming 100 basketball 50
2 volleyball 100 np.nan np.nan np.nan np.nan np.nan np.nan
3 basketball 1000 football 10 np.nan np.nan np.nan np.nan
4 swimming 280 swimming 200 basketball 320 np.nan np.nan
5 volleyball 110 football 160 volleyball 30 np.nan np.nan

原始 DataFrame 具有相同的结构,但包含列 field_1 到 field_30 以及 area_1 到 area_30。

我想根据“field_x”中的不同表达式将列添加到带有水平组的 DF 中,并汇总对应的区域...添加的列应如下所示:

ID   group_1     area_1     group_2     area_2     group_3    area_3

1 scoccer 500 basketball 250 swimming 100
2 volleyball 100
3 basketball 1000 football 10
4 swimming 480 basketball 320
5 volleyball 140 football 160

有没有一种简单的方法可以实现这一点?

最佳答案

使用pd.wide_to_long reshape DataFrame,它允许您按字段和 ID 分组并对区域求和。在使用 cumcount 创建列标签后,pivot_table 回到宽格式。

df = (pd.wide_to_long(df, i='ID', j='num', stubnames=['field', 'area'], sep='_')
.groupby(['ID', 'field'])['area'].sum()
.reset_index())
# ID field area
#0 1 basketball 250.0
#1 1 scoccer 500.0
#2 1 swimming 100.0
#3 2 volleyball 100.0
#4 3 basketball 1000.0
#5 3 football 10.0
#6 4 basketball 320.0
#7 4 swimming 480.0
#8 5 football 160.0
#9 5 volleyball 140.0

df['idx'] = df.groupby('ID').cumcount()+1
df = (pd.pivot_table(df, index='ID', columns='idx', values=['field', 'area'],
aggfunc='first')
.sort_index(axis=1, level=1))
df.columns = ['_'.join(map(str, tup)) for tup in df.columns]

    area_1     field_1  area_2     field_2  area_3   field_3
ID
1 250.0 basketball 500.0 scoccer 100.0 swimming
2 100.0 volleyball NaN NaN NaN NaN
3 1000.0 basketball 10.0 football NaN NaN
4 320.0 basketball 480.0 swimming NaN NaN
5 160.0 football 140.0 volleyball NaN NaN

只是为了好玩,您可以使用未记录的 pd.lreshape 而不是 wide_to_long

# Change range to (1,31) for your real data.
pd.lreshape(df, {'area': [f'area_{i}' for i in range(1,5)],
'field': [f'field_{i}' for i in range(1,5)]}

# ID area field
#0 1 500.0 scoccer
#1 2 100.0 volleyball
#2 3 1000.0 basketball
#3 4 280.0 swimming
#4 5 110.0 volleyball
#5 1 200.0 basketball
#....
#10 4 320.0 basketball
#11 5 30.0 volleyball
#12 1 50.0 basketball

关于python - 为每一行向 DF 添加唯一组,包括来自其他列的总和,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/62717796/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com