gpt4 book ai didi

python - 如何在 Pandas 的每个分区窗口中获得密集排名

转载 作者:行者123 更新时间:2023-12-05 02:46:50 24 4
gpt4 key购买 nike

我有一个 Pandas 数据框如下

Dominant_Topic  word    appearance
Topic 0 aaaawww 50
Topic 0 aacn 100
Topic 0 aaren 20
Topic 0 aarongoodwin 200
Topic 1 aaronjfentress 10
Topic 1 aaronrodger 20
Topic 1 aasmiitkap 30
Topic 2 aavqbketmh 10
Topic 2 ab 10
Topic 2 abandon 1

我想为每个分区获得密集排名,分区列是名为 Dominant_Topic 的列。排名应根据每个分区中单词出现的次数降序排列。所以输出看起来像 -

Dominant_Topic  word    appearance    dense_rank
Topic 0 aaaawww 50 3
Topic 0 aacn 100 2
Topic 0 aaren 20 4
Topic 0 aarongoodwin 200 1
Topic 1 aaronjfentress 10 3
Topic 1 aaronrodger 20 2
Topic 1 aasmiitkap 30 1
Topic 2 aavqbketmh 10 1
Topic 2 ab 10 1
Topic 2 abandon 1 2

我如何在 Pandas 中实现这一点?

等效的 SQL 看起来像这样 -

select *, dense_rank() over( partition by dominant_topic order by appearance desc)
from table

最佳答案

这是内置于 groupby 的:

df['dense_rank'] = (df.groupby('Dominant_Topic')['appearance']
.rank(method='dense', ascending=False)
.astype(int)
)

输出:

  Dominant_Topic            word  appearance  dense_rank
0 Topic 0 aaaawww 50 3
1 Topic 0 aacn 100 2
2 Topic 0 aaren 20 4
3 Topic 0 aarongoodwin 200 1
4 Topic 1 aaronjfentress 10 3
5 Topic 1 aaronrodger 20 2
6 Topic 1 aasmiitkap 30 1
7 Topic 2 aavqbketmh 10 1
8 Topic 2 ab 10 1
9 Topic 2 abandon 1 2

关于python - 如何在 Pandas 的每个分区窗口中获得密集排名,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/65345336/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com