python - 创建文本中单词的字典-6ren

python - 创建文本中单词的字典

转载作者：太空宇宙更新时间：2023-11-04 10:19:01

我想创建一个包含文本中所有不同单词的字典。键是单词，值是单词的频率

dtt = ['you want home at our peace', 'we went our home', 'our home is nice', 'we want peace at home']
word_listT = str(' '.join(dtt)).split()
wordsT = {v:k for (k, v) in enumerate(word_listT)}
print wordsT

我期待这样的事情:

{'we': 2, 'is': 1, 'peace': 2, 'at': 2, 'want': 2, 'our': 3, 'home': 4, 'you': 1, 'went': 1, 'nice': 1}

但是，我收到了这个:

{'we': 14, 'is': 12, 'peace': 16, 'at': 17, 'want': 15, 'our': 10, 'home': 18, 'you': 0, 'went': 7, 'nice': 13}

显然，我滥用了功能或做错了什么。

求助

最佳答案

你正在做的事情的问题是你正在存储单词所在的数组索引而不是这些单词的计数。

要实现这一点，您只需使用 collections.Counter

from collections import Counter

dtt = ['you want home at our peace', 'we went our home', 'our home is nice', 'we want peace at home']
counted_words = Counter(' '.join(dtt).split())
# if you want to see what the counted words are you can print it
print counted_words

>>> Counter({'home': 4, 'our': 3, 'we': 2, 'peace': 2, 'at': 2, 'want': 2, 'is': 1, 'you': 1, 'went': 1, 'nice': 1})

一些清理:如评论中所述

str() 对于您的 ' '.join(dtt).split()

是不必要的

您还可以删除列表分配并在同一行上进行计数器

Counter(' '.join(dtt).split())

关于您的列表索引的更多细节；首先，您必须了解您的代码在做什么。

dtt = [
    'you want home at our peace', 
    'we went our home', 
    'our home is nice', 
    'we want peace at home'
]

注意这里有 19 个单词； print len(word_listT) 返回 19。现在在下一行 word_listT = str(' '.join(dtt)).split() 中，您正在列出所有看起来像这样的单词

word_listT = [
    'you', 
    'want', 
    'home', 
    'at', 
    'our', 
    'peace', 
    'we', 
    'went', 
    'our', 
    'home', 
    'our', 
    'home', 
    'is', 
    'nice', 
    'we', 
    'want', 
    'peace', 
    'at', 
    'home'
]