gpt4 book ai didi

python - 如何在 pandas 的组中创建具有最后一个值和第一个值之间差异的列

转载 作者:太空宇宙 更新时间:2023-11-04 00:02:36 25 4
gpt4 key购买 nike

我有如下数据:

{'grp': {0: 828893, 1: 828893, 2: 828893, 3: 828893, 4: 828893, 5: 828893, 6: 828893, 7: 828893, 8: 828893, 9: 828893, 10: 828893, 11: 828893, 12: 828893, 13: 828893, 14: 828893, 15: 828893, 16: 828893, 17: 828893, 18: 828893, 19: 828893, 20: 828893, 21: 828893, 22: 828893, 23: 828893, 24: 828893}, 'grp2': {0: nan, 1: nan, 2: nan, 3: nan, 4: '1', 5: '1', 6: '1', 7: '1', 8: '1', 9: '1', 10: nan, 11: nan, 12: '2', 13: '2', 14: '2', 15: '2', 16: nan, 17: nan, 18: nan, 19: '3', 20: nan, 21: '4', 22: '4', 23: '4', 24: '4'}, 'val1': {0: -50.0, 1: -50.0, 2: -50.0, 3: -50.0, 4: 7.600000000000001, 5: 54.599999999999994, 6: 38.599999999999994, 7: 50.599999999999994, 8: 91.0, 9: 100.80000000000001, 10: 19.200000000000003, 11: -50.0, 12: -50.0, 13: 69.6, 14: 42.0, 15: 90.19999999999999, 16: -50.0, 17: -50.0, 18: 47.599999999999994, 19: 98.80000000000001, 20: 27.599999999999994, 21: 11.799999999999997, 22: nan, 23: 13.0, 24: 0.0}, 'val2': {0: 0.0, 1: 0.0, 2: 0.0, 3: 0.0, 4: 30.1, 5: 21.5, 6: 20.7, 7: 4.2, 8: 5.0, 9: 21.6, 10: 85.1, 11: 0.0, 12: 0.0, 13: 36.4, 14: 56.6, 15: 51.2, 16: 0.0, 17: 0.0, 18: 58.5, 19: 42.2, 20: 76.1, 21: 68.7, 22: nan, 23: 90.3, 24: 95.3}}

我想先按 grpgrp2 列分组,然后创建一个新列 val1_bval2_b 定义为分别来自 val1val2 的最后一次和第一次观察(组内)之间的差异。 R 中的代码类似于:

ex %>% 
group_by(grp, grp2) %>%
mutate(val1_b = last(val1) - first(val1),
val2_b = last(val2) - first(val2)) %>%
ungroup()

但我需要用 Python 来完成。我能得到的最近的是:

pd.DataFrame(ex).groupby(['grp', 'grp2'])['val1'].apply(lambda x: x.iat[-1] - x.iat[0])

但这仅适用于一列,结果是汇总的而不是添加到现有数据框中。因此,如何计算一组中最后一次观察和第一次观察之间多列的差异,并将其作为新列添加到现有数据框中?

最佳答案

使用GroupBy.transformGroupBy.firstGroupBy.last ,一种可能的解决方案 DataFrame.add_prefixDataFrame.join对于新列:

df = pd.DataFrame(ex)
#columns for processing defined after groupby
g = df.groupby(['grp', 'grp2'])['val1', 'val2']
out = df.join((g.transform('last') - g.transform('first')).add_prefix('new_'))

就像评论中提到的@Wen-Ben 是没有join 的可能替代方案(谢谢):

df[['new_val1',  'new_val2']] = g.transform('last') - g.transform('first')

print (out)
grp grp2 val1 val2 new_val1 new_val2
0 828893 NaN -50.0 0.0 NaN NaN
1 828893 NaN -50.0 0.0 NaN NaN
2 828893 NaN -50.0 0.0 NaN NaN
3 828893 NaN -50.0 0.0 NaN NaN
4 828893 1 7.6 30.1 93.2 -8.5
5 828893 1 54.6 21.5 93.2 -8.5
6 828893 1 38.6 20.7 93.2 -8.5
7 828893 1 50.6 4.2 93.2 -8.5
8 828893 1 91.0 5.0 93.2 -8.5
9 828893 1 100.8 21.6 93.2 -8.5
10 828893 NaN 19.2 85.1 NaN NaN
11 828893 NaN -50.0 0.0 NaN NaN
12 828893 2 -50.0 0.0 140.2 51.2
13 828893 2 69.6 36.4 140.2 51.2
14 828893 2 42.0 56.6 140.2 51.2
15 828893 2 90.2 51.2 140.2 51.2
16 828893 NaN -50.0 0.0 NaN NaN
17 828893 NaN -50.0 0.0 NaN NaN
18 828893 NaN 47.6 58.5 NaN NaN
19 828893 3 98.8 42.2 0.0 0.0
20 828893 NaN 27.6 76.1 NaN NaN
21 828893 4 11.8 68.7 -11.8 26.6
22 828893 4 NaN NaN -11.8 26.6
23 828893 4 13.0 90.3 -11.8 26.6
24 828893 4 0.0 95.3 -11.8 26.6

关于python - 如何在 pandas 的组中创建具有最后一个值和第一个值之间差异的列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55208852/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com