gpt4 book ai didi

python - 按一列中出现的频率对整个 csv 进行排序,并仅显示实例的一个副本

转载 作者:太空宇宙 更新时间:2023-11-04 08:40:51 27 4
gpt4 key购买 nike

我有一个看起来像这样的 csv:

CompanyName    High Priority     QualityIssue
Customer1 Yes User
Customer1 Yes User
Customer2 No User
Customer3 No Equipment
Customer1 No Neither
Customer3 No User
Customer3 Yes User
Customer3 Yes Equipment
Customer4 No User

我想计算 CompanyName 中的每个实例在整个文件中出现了多少次,并按出现次数递减排序,但只打印一次 CompanyName:

例如,使用这段代码:

df['count'] = df.groupby('CompanyName'['CompanyName'].transform(pd.Series.value_counts)
df.sort('count', ascending=False)

我得到:

Out:

CompanyName HighPriority QualityIssue count
5 Customer3 No User 4
3 Customer3 No Equipment 4
7 Customer3 Yes Equipment 4
6 Customer3 Yes User 4
0 Customer1 Yes User 3
4 Customer1 No Neither 3
1 Customer1 Yes User 3
8 Customer4 No User 1
2 Customer2 No User 1

我想要的是:

   CompanyName   count
Customer3 4
Customer1 3
Customer4 1
Customer2 1

有什么想法吗?

问题 2:空行:

CompanyName    High Priority     QualityIssue
Customer1 Yes User
Customer1 Yes User
No User
Customer3 No Equipment
Customer1 No Neither
No User
Customer3 Yes User
Customer3 Yes Equipment
Customer4 No User

预期输出:

   CompanyName   count
Customer3 3
Customer1 3
2
Customer4 1

最佳答案

我想你可以跳过两行,直接写

# single columns
df.CompanyNames.value_counts()
# or
df['CompanyNames'].value_counts()
# or via Sriram solution
df.groupby(['CompanyNames']).size()

# Multiple columns
df.groupBy(['CompanyNames', 'HighPriority']).size()

Python: get a frequency count based on two columns (variables) in pandas dataframe

这应该会给你你想要的,而不是将计数附加为一列。

编辑

替换 Nan 值然后找到计数

df.CompanyNames = df.CompanyNames.fillna('unknown')
# or inline
df.CompanyNames.fillna('unknown', inplace=True)

然后用之前的代码总结一下

关于python - 按一列中出现的频率对整个 csv 进行排序,并仅显示实例的一个副本,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45021167/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com