gpt4 book ai didi

python - 高效计算字符串中的词频

转载 作者:太空狗 更新时间:2023-10-29 19:31:31 25 4
gpt4 key购买 nike

我正在解析一长串文本并计算每个单词在 Python 中出现的次数。我有一个有效的函数,但我正在寻找关于是否有方法可以使它更有效率(在速度方面)以及是否有 python 库函数可以为我做这件事的建议,所以我不会重新发明轮子?

您能否建议一种更有效的方法来计算长字符串(通常在字符串中超过 1000 个单词)中出现的最常见单词?

另外,将字典排序到列表中的最佳方法是什么,其中第一个元素是最常见的单词,第二个元素是第二个最常见的单词等等?

test = """abc def-ghi jkl abc
abc"""

def calculate_word_frequency(s):
# Post: return a list of words ordered from the most
# frequent to the least frequent

words = s.split()
freq = {}
for word in words:
if freq.has_key(word):
freq[word] += 1
else:
freq[word] = 1
return sort(freq)

def sort(d):
# Post: sort dictionary d into list of words ordered
# from highest freq to lowest freq
# eg: For {"the": 3, "a": 9, "abc": 2} should be
# sorted into the following list ["a","the","abc"]

#I have never used lambda's so I'm not sure this is correct
return d.sort(cmp = lambda x,y: cmp(d[x],d[y]))

print calculate_word_frequency(test)

最佳答案

使用collections.Counter :

>>> from collections import Counter
>>> test = 'abc def abc def zzz zzz'
>>> Counter(test.split()).most_common()
[('abc', 2), ('zzz', 2), ('def', 2)]

关于python - 高效计算字符串中的词频,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/9919604/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com