gpt4 book ai didi

python - 如何构建 pandas 数据框中项目的频率计数表?

转载 作者:行者123 更新时间:2023-11-30 22:11:54 28 4
gpt4 key购买 nike

假设我在 csv 文件 example.csv 中有以下数据:

Word    Score
Dog 1
Bird 2
Cat 3
Dog 2
Dog 3
Dog 1
Bird 3
Cat 1
Bird 1
Cat 3

我想计算每个分数的每个单词的频率。预期输出如下:

        1   2   3
Dog 2 1 1
Bird 0 1 1
Cat 1 0 2

我执行此操作的代码如下:

将 pandas 导入为 pd

x1 = pd.read_csv(r'path\to\example.csv')

def getUniqueWords(allWords) :
uniqueWords = []
for i in allWords:
if not i in uniqueWords:
uniqueWords.append(i)
return uniqueWords

unique_words = getUniqueWords(x1['Word'])
unique_scores = getUniqueWords(x1['Score'])

scores_matrix = [[0 for x in range(len(unique_words))] for x in range(len(unique_scores)+1)]
# The '+1' is because Python indexing starts from 0; so if a score of 0 is present in the data, the 0 index will be used for that.

for i in range(len(unique_words)):
temp = x1[x1['Word']==unique_words[i]]
for j, word in temp.iterrows():
scores_matrix[i][j] += 1 # Supposed to store the count for word i with score j

但这会出现以下错误:

IndexError                                Traceback (most recent call last)
<ipython-input-123-141ab9cd7847> in <module>()
19 temp = x1[x1['Word']==unique_words[i]]
20 for j, word in temp.iterrows():
---> 21 scores_matrix[i][j] += 1

IndexError: list index out of range

此外,即使我可以修复此错误,scores_matrix 也不会显示标题(DogBirdCat 作为行索引,123 作为列索引)。我希望能够访问每个分数的每个单词的计数 - 达到这样的效果:

scores_matrix['Dog'][1]
>>> 2

scores_matrix['Cat'][2]
>>> 0

那么,我该如何解决/修复这两个问题?

最佳答案

使用groupby与 sort=False 和 value_countssizeunstack :

df1 = df.groupby('Word', sort=False)['Score'].value_counts().unstack(fill_value=0)
<小时/>
df1 = df.groupby(['Word','Score'], sort=False).size().unstack(fill_value=0)

print (df1)
Score 1 2 3
Word
Dog 2 1 1
Bird 1 1 1
Cat 1 0 2

如果顺序不重要,请使用 crosstab :

df1 = pd.crosstab(df['Word'], df['Score'])
print (df1)
Score 1 2 3
Word
Bird 1 1 1
Cat 1 0 2
Dog 2 1 1

最后通过带有 DataFrame.loc 的标签进行选择:

print (df.loc['Cat', 2])
0

关于python - 如何构建 pandas 数据框中项目的频率计数表?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51280121/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com