gpt4 book ai didi

python - 从 Pandas 数据框中,如何找到每个用户的重复评论数?

转载 作者:太空宇宙 更新时间:2023-11-03 13:59:56 25 4
gpt4 key购买 nike

我有包含用户名列表及其评论的数据框,请参见下面的格式。

为每个用户查找重复评论(垃圾邮件)的最快、最有效的方法是什么?

数据框格式:

Author  | Comment
casy Nice picture!
linda I like this
casy Nice picture!
tom I disagree
bob Follow me
bob Follow me
bob Follow me
bob Follow me
casy Nice picture!
casy Wow!
linda Interesting post
linda Check my profile
bob Dissapointing
casy Wow!

我想得到以下格式的结果,所以结果表将是:

Author  | Number of dup. comments (descending)  | Comment   
bob 4 Follow me
casy 3 Nice picture
casy 2 Wow!
bob 1 Dissapointing
linda 1 I like this
linda 1 Check my profile
linda 1 Interesting post
tom 1 I disagree

最佳答案

使用groupbysize首先,然后 sort_values , 通过 reset_index 创建列如果需要,最后按 reindex 更改列的顺序:

df = (df.groupby(['Author', 'Comment'], sort=False).size()
.sort_values(ascending=False)
.reset_index(name='Number')
.reindex(columns=['Author','Number','Comment']))
print (df)
Author Number Comment
0 bob 4 Follow me
1 casy 3 Nice picture!
2 casy 2 Wow!
3 bob 1 Dissapointing
4 linda 1 Check my profile
5 linda 1 Interesting post
6 tom 1 I disagree
7 linda 1 I like this

关于python - 从 Pandas 数据框中,如何找到每个用户的重复评论数?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50552756/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com