gpt4 book ai didi

python - 使用条件计算 pandas 数据框中出现的总数

转载 作者:太空宇宙 更新时间:2023-11-03 13:08:33 24 4
gpt4 key购买 nike

我有这个数据框:

cat_df.head()

category depth
0 food 0.0
1 food 1.0
2 sport 1.0
3 food 3.0
4 school 0.0
5 school 0.0
6 school 1.0
...

其中depth = 0代表根发布,depth > 0是评论。

对于每个类别,我想计算根出版物的数量 (depth=0) 和评论的数量 (depth>0)。

我使用 value_counts() 来计算唯一值:

cat_df['category'].value_counts().head(15)

category total number
food 44062
sport 38004
school 11080
life 8810
...

我以为我可以将 ['depth'] == 0 作为数据框内的条件,但它不起作用:

cat_df[cat_df['depth'] == 0].value_counts().head(5)

如何获取 depth=0 和 depth>0 的总出现次数?

我想把它放在这样的表格中:

category | total number | depth=0 | depth>0 
...

最佳答案

您只能使用一个groupby 来提高性能:

df = (cat_df['depth'].ne(0)
.groupby(cat_df['category'])
.value_counts()
.unstack(fill_value=0)
.rename(columns={0:'depth=0', 1:'depth>0'})
.assign(total=lambda x: x.sum(axis=1))
.reindex(columns=['total','depth=0','depth>0']))

print (df)
depth total depth=0 depth>0
category
food 3 1 2
school 3 2 1
sport 1 0 1

解释:

  1. 首先比较 depth 列不等于 Series.ne (!=)
  2. groupbycategorySeriesGroupBy.value_counts
  3. reshape unstack
  4. 按字典重命名
  5. assign 创建新的 total
  6. 要自定义列的顺序,请添加 reindex

编辑:

cat_df = pd.DataFrame({'category': ['food', 'food', 'sport', 'food', 'school', 'school', 'school'], 'depth': [0.0, 1.0, 1.0, 3.0, 0.0, 0.0, 1.0], 'num_of_likes': [10, 10, 10, 20, 20, 20, 20]})

print (cat_df)
category depth num_of_likes
0 food 0.0 10
1 food 1.0 10
2 sport 1.0 10
3 food 3.0 20
4 school 0.0 20
5 school 0.0 20
6 school 1.0 20

df = (cat_df['depth'].ne(0)
.groupby([cat_df['num_of_likes'], cat_df['category']])
.value_counts()
.unstack(fill_value=0)
.rename(columns={0:'depth=0', 1:'depth>0'})
.assign(total=lambda x: x.sum(axis=1))
.reindex(columns=['total','depth=0','depth>0'])
.reset_index()
.rename_axis(None, axis=1)
)

print (df)
num_of_likes category total depth=0 depth>0
0 10 food 2 1 1
1 10 sport 1 0 1
2 20 food 1 0 1
3 20 school 3 2 1

编辑1:

s = cat_df.groupby('category')['num_of_likes'].sum()
print (s)
category
food 40
school 60
sport 10
Name: num_of_likes, dtype: int64

df = (cat_df['depth'].ne(0)
.groupby(cat_df['category'])
.value_counts()
.unstack(fill_value=0)
.rename(columns={0:'depth=0', 1:'depth>0'})
.assign(total=lambda x: x.sum(axis=1))
.reindex(columns=['total','depth=0','depth>0'])
.reset_index()
.rename_axis(None, axis=1)
.assign(num_of_likes=lambda x: x['category'].map(s))
)
print (df)
category total depth=0 depth>0 num_of_likes
0 food 3 1 2 40
1 school 3 2 1 60
2 sport 1 0 1 10

关于python - 使用条件计算 pandas 数据框中出现的总数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49878814/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com