gpt4 book ai didi

groupby中特定行的Python Pandas差异

转载 作者:行者123 更新时间:2023-12-01 13:10:16 29 4
gpt4 key购买 nike

我有一个 Pandas 数据框

df = pd.DataFrame({'Firm': ['Firm1','Firm1','Firm1','Firm1','Firm1','Firm1','Firm2','Firm2','Firm2','Firm2','Firm2','Firm2'],'Location' : ['Country1', 'Country1', 'Country1', 'Country2', 'Country2', 'Country2','Country1', 'Country1', 'Country1', 'Country2', 'Country2', 'Country2'], 'Currency' : ['Curr1', 'Curr2', 'Curr3', 'Curr1', 'Curr2', 'Curr3','Curr1', 'Curr2', 'Curr3', 'Curr1', 'Curr2', 'Curr3'], 'Value' : [100, 105, 110, 100, 95, 120, 95, 110, 115, 105, 120, 90] })

看起来像这样:

df:
     Firm  Location Currency  Value
0 Firm1 Country1 Curr1 100
1 Firm1 Country1 Curr2 105
2 Firm1 Country1 Curr3 110
3 Firm1 Country2 Curr1 100
4 Firm1 Country2 Curr2 95
5 Firm1 Country2 Curr3 120
6 Firm2 Country1 Curr1 95
7 Firm2 Country1 Curr2 110
8 Firm2 Country1 Curr3 115
9 Firm2 Country2 Curr1 105
10 Firm2 Country2 Curr2 120
11 Firm2 Country2 Curr3 90

现在我想计算每个 Firm-Location 组的 Curr3 和 Curr2(列值)之间的差异,并根据结果更改 Curr3 的值。生成的 df 应如下所示:
     Firm  Location Currency  Value
0 Firm1 Country1 Curr1 100
1 Firm1 Country1 Curr2 105
2 Firm1 Country1 Curr3 5
3 Firm1 Country2 Curr1 100
4 Firm1 Country2 Curr2 95
5 Firm1 Country2 Curr3 25
6 Firm2 Country1 Curr1 95
7 Firm2 Country1 Curr2 110
8 Firm2 Country1 Curr3 5
9 Firm2 Country2 Curr1 105
10 Firm2 Country2 Curr2 120
11 Firm2 Country2 Curr3 -30

我试过使用 .groupby.apply这给了我结果,但是我想在原始数据框中进行转换。
df2 = df.groupby(['Firm','Location']).apply(lambda g: g[g.Currency == 'Curr3'].Value.values[0] - g[g.Currency == 'Curr2'].Value.values[0])

df2:
Firm    Location    0
Firm1 Country1 5
Firm1 Country2 25
Firm2 Country1 5
Firm2 Country2 -30

我无法弄清楚如何在原始 df 中就地执行此操作。我也使用 .transform 尝试了同样的方法,但是它会产生一个错误:
df2 = df.groupby(['Firm','Location']).transform(lambda g: g[g.Currency == 'Curr3'].Value.values[0] - g[g.Currency == 'Curr2'].Value.values[0])

AttributeError: ("'Series' object has no attribute 'Currency'", 'occurred at index Currency')

---- 基于二凡的解决方案更新:
newvals = (
df.where(df['Currency'].isin(['Curr2', 'Curr3']))
.groupby(['Firm', 'Location'])['Value'].diff()
)
df['Value'] = newvals.fillna(df['Value'])

但是,如果 df 看起来像这样(货币未排序),则该解决方案不再有效(因为 diff() 仅计算与前一个值的差值
    Firm    Location    Currency    Value
0 Firm1 Country1 Curr2 100
1 Firm1 Country1 Curr1 105
2 Firm1 Country1 Curr3 110
3 Firm1 Country2 Curr3 100
4 Firm1 Country2 Curr2 95
5 Firm1 Country2 Curr1 120
6 Firm2 Country1 Curr1 95
7 Firm2 Country1 Curr2 110
8 Firm2 Country1 Curr3 115
9 Firm2 Country2 Curr2 105
10 Firm2 Country2 Curr3 120
11 Firm2 Country2 Curr1 90

-> 结果:
    Firm    Location    Currency    Value
0 Firm1 Country1 Curr2 100.0
1 Firm1 Country1 Curr1 105.0
2 Firm1 Country1 Curr3 10.0
3 Firm1 Country2 Curr3 100.0
4 Firm1 Country2 Curr2 -5.0
5 Firm1 Country2 Curr1 120.0
6 Firm2 Country1 Curr1 95.0
7 Firm2 Country1 Curr2 110.0
8 Firm2 Country1 Curr3 5.0
9 Firm2 Country2 Curr2 105.0
10 Firm2 Country2 Curr3 15.0
11 Firm2 Country2 Curr1 90.0

现在,不再是每次计算 Curr3 和 Curr 2 之间的差异并替换 Curr3 的值的情况。

最佳答案

使用 DataFrame.where , Series.isin , GroupBy.diffSeries.fillna :

首先我们转换所有Curr1NaNwhere ,然后我们在 Firm 上分组和 Location并计算 Value 的差异.

newvals = (
df.where(df['Currency'].isin(['Curr2', 'Curr3']))
.groupby(['Firm', 'Location'])['Value'].diff()
)
df['Value'] = newvals.fillna(df['Value'])
     Firm  Location Currency  Value
0 Firm1 Country1 Curr1 100.0
1 Firm1 Country1 Curr2 105.0
2 Firm1 Country1 Curr3 5.0
3 Firm1 Country2 Curr1 100.0
4 Firm1 Country2 Curr2 95.0
5 Firm1 Country2 Curr3 25.0
6 Firm2 Country1 Curr1 95.0
7 Firm2 Country1 Curr2 110.0
8 Firm2 Country1 Curr3 5.0
9 Firm2 Country2 Curr1 105.0
10 Firm2 Country2 Curr2 120.0
11 Firm2 Country2 Curr3 -30.0

关于groupby中特定行的Python Pandas差异,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60562621/

29 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com