gpt4 book ai didi

algorithm - Count-Min Sketch 和 Heavy-Hitters 问题

转载 作者:塔克拉玛干 更新时间:2023-11-03 06:39:28 25 4
gpt4 key购买 nike

我正在阅读 Count-Min Sketch 数据结构,它根据错误概率参数和容差参数为点和范围查询提供概率答案。例如,CM 可以回答“item x 在数据流中出现了多少次,概率为 10%”。

重击手的相关问题也出现了。在为 HH 问题实现最小堆时,我注意到各种研究论文都指出,只有当草图中某个项目的最小计数大于阈值时,我们才会插入堆中。

我的问题是,这是否意味着我们正在概率性地回答重击者问题?相应的问题是“有 10% 的概率,哪个项目在数据流中出现频率第二高?”

最佳答案

来自维基百科:

In the data stream model, the frequent elements problem is to output a set of elements that constitute more than some fixed fraction of the stream. A special case is the majority problem, which is to determine whether or not any value constitutes a majority of the stream.

More formally, fix some positive constant c > 1, let the length of the stream be m, and let fi denote the frequency of value i in the stream. The frequent elements problem is to output the set { i | fi > m/c }.

Some notable algorithms are:

  • Boyer–Moore majority vote algorithm
  • Karp-Papadimitriou-Shenker algorithm
  • Count-Min sketch
  • Sticky sampling
  • Lossy counting
  • Sample and Hold
  • Multi-stage Bloom filters
  • Count-sketch
  • Sketch-guided sampling

Event detection Detecting events in data streams is often done using a heavy hitters algorithm as listed above: the most frequent items and their frequency are determined using one of these algorithms, then the largest increase over the previous time point is reported as trend. This approach can be refined by using exponentially weighted moving averages and variance for normalization.

所以,是的。 CMS可用于确定频率(以近似的方式),可用于回答HH问题。

关于algorithm - Count-Min Sketch 和 Heavy-Hitters 问题,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53276616/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com