gpt4 book ai didi

python - Pandas:添加满足条件的元素的渐进计数列

转载 作者:行者123 更新时间:2023-11-28 21:35:53 25 4
gpt4 key购买 nike

给定以下数据框df:

df = pd.DataFrame({'A':['Tony', 'Mike', 'Jen', 'Anna'], 'B': ['no', 'yes', 'no', 'yes']})

A B
0 Tony no
1 Mike yes
2 Jen no
3 Anna yes

我想添加另一列,逐步计算具有 df['B']='yes' 的元素:

    A    B   C
0 Tony no 0
1 Mike yes 1
2 Jen no 0
3 Anna yes 2

我该怎么做?

最佳答案

您可以使用 numpy.wherecumsum bool 掩码:

m = df['B']=='yes'
df['C'] = np.where(m, m.cumsum(), 0)

另一种解决方案是通过过滤创建count bool 掩码,然后通过reindex添加0值:

m = df['B']=='yes'
df['C'] = m[m].cumsum().reindex(df.index, fill_value=0)
print (df)
A B C
0 Tony no 0
1 Mike yes 1
2 Jen no 0
3 Anna yes 2

性能(实际数据应该不同,最好先检查一下):

np.random.seed(123)
N = 10000
L = ['yes','no']
df = pd.DataFrame({'B': np.random.choice(L, N)})
print (df)

In [150]: %%timeit
...: m = df['B']=='yes'
...: df['C'] = np.where(m, m.cumsum(), 0)
...:
1.57 ms ± 34.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [151]: %%timeit
...: m = df['B']=='yes'
...: df['C'] = m[m].cumsum().reindex(df.index, fill_value=0)
...:
2.53 ms ± 54.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [152]: %%timeit
...: df['C'] = df.groupby('B').cumcount() + 1
...: df['C'].where(df['B'] == 'yes', 0, inplace=True)

4.49 ms ± 27.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

关于python - Pandas:添加满足条件的元素的渐进计数列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51768947/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com