gpt4 book ai didi

python - Pandas groupby 在使用两次应用时复制组

转载 作者:行者123 更新时间:2023-12-04 15:35:32 25 4
gpt4 key购买 nike

pandas groupby 可以使用 groupby.apply(func) 并在 func 中使用 .apply() 的另一个实例而不复制和覆盖数据?

在某种程度上,.apply()的使用是嵌套的。

Python 3.7.3pandas==0.25.1

import pandas as pd


def dummy_func_nested(row):
row['new_col_2'] = row['value'] * -1
return row


def dummy_func(df_group):
df_group['new_col_1'] = None

# apply dummy_func_nested
df_group = df_group.apply(dummy_func_nested, axis=1)

return df_group


def pandas_groupby():
# initialize data
df = pd.DataFrame([
{'country': 'US', 'value': 100.00, 'id': 'a'},
{'country': 'US', 'value': 95.00, 'id': 'b'},
{'country': 'CA', 'value': 56.00, 'id': 'y'},
{'country': 'CA', 'value': 40.00, 'id': 'z'},
])

# group by country and apply first dummy_func
new_df = df.groupby('country').apply(dummy_func)

# new_df and df should have the same list of countries
assert new_df['country'].tolist() == df['country'].tolist()
print(df)


if __name__ == '__main__':
pandas_groupby()

上面的代码应该返回

  country  value id new_col_1  new_col_2
0 US 100.0 a None -100.0
1 US 95.0 b None -95.0
2 CA 56.0 y None -56.0
3 CA 40.0 z None -40.0

但是,代码返回

  country  value id new_col_1  new_col_2
0 US 100.0 a None -100.0
1 US 95.0 a None -95.0
2 US 56.0 a None -56.0
3 US 40.0 a None -40.0

只有当两组的行数相等时,才会出现这种行为。如果一组有更多行,则输出符合预期。

最佳答案

引自documentation :

In the current implementation apply calls func twice on the first column/row to decide whether it can take a fast or slow code path. This can lead to unexpected behavior if func has side-effects, as they will take effect twice for the first column/row.

尝试在您的代码中更改以下代码:

def dummy_func(df_group):
df_group['new_col_1'] = None

# apply dummy_func_nested
df_group = df_group.apply(dummy_func_nested, axis=1)

return df_group

收件人:

def dummy_func(df_group):
df_group['new_col_1'] = None

# apply dummy_func_nested
df_group = dummy_func_nested(df_group)

return df_group

您不需要apply

当然,更有效的方法是:

df['new_col_1'] = None
df['new_col_2'] = -df['value']
print(df)

或者:

print(df.assign(new_col_1=None, new_col_2=-df['value']))

关于python - Pandas groupby 在使用两次应用时复制组,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59889744/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com