gpt4 book ai didi

python-3.x - Pandas 总计数高于阈值

转载 作者:行者123 更新时间:2023-12-03 16:47:15 25 4
gpt4 key购买 nike

我有一个要分组的数据框。我想使用 df.agg 来确定超过 180 的长度。

有没有办法为它写一个小函数?

我尝试了 len(nice_numbers[nice_numbers > 180]) 但它没有用。

df = pd.DataFrame(data = {'nice_numbers': [60, 64, 67, 70, 73, 75, 130, 180, 184, 186, 187, 187, 188, 194, 199, 195, 200, 210, 220, 222, 224, 250, 70, 40, 30, 300], 'activity': 'sleeping', 'sleeping', 'sleeping', 'walking', 'walking', 'walking', 'working', 'working', 'working', 'working', 'working', 'restaurant', 'restaurant', 'restaurant', 'restaurant', 'walking', 'walking', 'walking', 'working', 'working', 'driving', 'driving', 'driving', 'home', 'home', 'home}')
df_gb = df.groupby('activity')
df_gb.agg({'count_frequency_over_180'})

谢谢

最佳答案

通过比较列 gt 创建 bool 掩码对计数 True 的值进行汇总 sum:

df1 = (df['nice_numbers'].gt(180)
.groupby(df['activity'], sort=False)
.sum()
.astype(int)
.reset_index())

sumset_index 创建的索引的类似解决方案:

df1 = df.set_index('activity')['nice_numbers'].gt(180).sum(level=0).astype(int).reset_index()
print (df1)
activity nice_numbers
0 sleeping 0
1 walking 3
2 working 5
3 restaurant 4
4 driving 2
5 home 1

编辑:

有关 nice_numbers 列的更多指标,请使用 agg :

agg = ('abobe_180_count', lambda x: x.gt(180).sum()), ('average', 'mean')
df1 = df.groupby('activity')['nice_numbers'].agg(agg).reset_index()
print (df1)
activity abobe_180_count average
0 driving 2 181.333333
1 home 1 123.333333
2 restaurant 4 192.000000
3 sleeping 0 63.666667
4 walking 3 137.166667
5 working 5 187.000000

对于多个阈值的使用:

df1 = pd.DataFrame({'threshold':[180, 270, 60]})
print (df1.head())
threshold
0 180
1 270
2 60

#compare values by numpy broadcasting
arr = df['nice_numbers'].to_numpy()[:, None] > df1['threshold'].to_numpy()

#create new DataFrame and add column activity
df2 = (pd.DataFrame(arr, index=df.index, columns=df1['threshold'].tolist())
.assign(activity = df['activity']))
print (df2.head())
180 270 60 activity
0 False False False sleeping
1 False False True sleeping
2 False False True sleeping
3 False False True walking
4 False False True walking

#aggregate sum
df3 = df2.groupby('activity', as_index=False).sum()
print (df3)
activity 180 270 60
0 driving 2 0 3
1 home 1 1 1
2 restaurant 4 0 4
3 sleeping 0 0 2
4 walking 3 0 6
5 working 5 0 7

关于python-3.x - Pandas 总计数高于阈值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50772396/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com