gpt4 book ai didi

python - 在 Python 中使用堆的前 K 个常用词

转载 作者:行者123 更新时间:2023-12-03 17:06:05 25 4
gpt4 key购买 nike

这个问题在这里已经有了答案:





The Most Efficient Way To Find Top K Frequent Words In A Big Word Sequence

(19 个回答)


10 个月前关闭。




我正在尝试解决 Top K Frequent Words Leetcode problem在 O(N log K) 时间内,得到了不希望的结果。我的 Python3 代码和控制台输出如下:

from collections import Counter
import heapq

class Solution:
def topKFrequent(self, words: List[str], k: int) -> List[str]:

counts = Counter(words)
print('Word counts:', counts)

result = []
for word in counts:
print('Word being added:', word)
if len(result) < k:
heapq.heappush(result, (-counts[word], word))
print(result)
else:
heapq.heappushpop(result, (-counts[word], word))
result = [r[1] for r in result]

return result

----------- Console output -----------

Word counts: Counter({'the': 3, 'is': 3, 'sunny': 2, 'day': 1})
Word being added: the
[(-3, 'the')]
Word being added: day
[(-3, 'the'), (-1, 'day')]
Word being added: is
[(-3, 'is'), (-1, 'day'), (-3, 'the')]
Word being added: sunny
[(-3, 'is'), (-2, 'sunny'), (-3, 'the'), (-1, 'day')]
当我运行测试用例时 ["the", "day", "is", "sunny", "the", "the", "sunny", "is", "is"]K = 4 ,我发现这个词 the移到列表末尾(在 day 之后)一次 is即使它们的计数都是 3,也会添加。这是有道理的,因为父级只需 <= 子级,而子级没有以任何方式排序。自 (-2, 'sunny')(-3, 'the')都是 > (-3, 'is') ,事实上,即使 (-3, 'the') 仍然保持堆不变量< (-2, 'sunny')并且是 (-3, 'is') 的右 child .预期结果是 ["is","the","sunny","day"]而我的代码的输出是 ["is","sunny","the","day"] .
我应该使用堆在 O(N log K) 时间内解决这个问题,如果是这样,我该如何修改我的代码以达到预期的结果?

最佳答案

您使用 heapq 走在正确的轨道上和 Counter您只需要对如何使用它们与 k 相关的方式稍作修改:(您需要在向 result 添加任何内容之前迭代整个计数):

from collections import Counter
import heapq

class Solution:
def topKFrequent(self, words: List[str], k: int) -> List[str]:
counts = collections.Counter(words)
max_heap = []
for key, val in counts.items():
heapq.heappush(max_heap, (-val, key))

result = []
while k > 0:
result.append(heapq.heappop(max_heap)[1])
k -= 1

return result

之前没有阅读 O(N log k) 的要求,这里是对上述解决方案的修改以实现:
from collections import Counter, deque
import heapq

class WordWithFrequency(object):
def __init__(self, word, frequency):
self.word = word
self.frequency = frequency

def __lt__(self, other):
if self.frequency == other.frequency:
return lt(other.word, self.word)
else:
return lt(self.frequency, other.frequency)

class Solution:
def topKFrequent(self, words: List[str], k: int) -> List[str]:
counts = collections.Counter(words)

max_heap = []
for key, val in counts.items():
heapq.heappush(max_heap, WordWithFrequency(key, val))
if len(max_heap) > k:
heapq.heappop(max_heap)

result = deque([]) # can also use a list and just reverse at the end
while k > 0:
result.appendleft(heapq.heappop(max_heap).word)
k -= 1

return list(result)

关于python - 在 Python 中使用堆的前 K 个常用词,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/64778567/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com