- iOS/Objective-C 元类和类别
- objective-c - -1001 错误,当 NSURLSession 通过 httpproxy 和/etc/hosts
- java - 使用网络类获取 url 地址
- ios - 推送通知中不播放声音
下面是 Knuth 的水库采样伪代码(如何从一组 k
数字中选择 n
数字,确保每个数字都具有相同的概率)。
初始化:一个大小为:k
的水库.
for i = k+1 to N
M = random(1, i);
if (M < k) // should this be if (M <= k)
SWAP the Mth value and ith value
end if
end for
从这段代码中,M < K
的概率是(k-1)/i
, 不是 k/i
,所以我认为 if
循环体中的语句应该是if (M < =k)
.我试图测试它们之间的区别,但我一无所获。
最佳答案
你是对的。但是,您的代码没有正确实现算法 R。错误是您的(或编写此代码的任何人),而不是 Knuth 的 ;-)
引自 Knuth(计算机编程艺术第 2 卷 3Ed 1998,第 144 页):
... A problem arises if we don't know the value of N in advance, since the precise value of N is crucial in Algorithm S. Suppose we want to select n items at random from a file, without knowing exactly how many are present in that file. We could first go through and count the records, then take a second pass to select them; but it is generally better to sample m > n of the original items on the first pass, where m is much less than N, so that only m items must be considered on the second pass. The trick, of course, is to do this in such a way that the final result is a truly random sample of the original file.
Since we don't know when the input is going to end, we must keep track of a random sample of the input records seen so far, thus always being prepared for the end. As we read the input we will construct a "reservoir" that contains only the records that have appeared among the previous samples. The first n records always go into the reservoir. When the (t + 1)st record is being input, for t>n, we will have in memory a table of n indices pointing to the records that we have chosen from among the first t. The problem is to maintain this situation with t increased by one, namely to find a new random sample from among the t + 1 records now known to be present. It is not hard to see that we should include the new record in the new sample with probability n/(t + 1), and in such a case it should replace a random element of the previous sample.
Thus, the following procedure does the job:
Algorithm R (Reservoir sampling). To select n records at random from a file of unknown size > n, given n > 0. An auxiliary file called the "reservoir" contains all records that are candidates for the final sample. The algorithm uses a table of distinct indices I[j] for 1 < j < n, each of which points to one of the records in the reservoir.
R1. [Initialize.] Input the first n records and copy them to the reservoir. Set I[j] ← j for 1 < j < n, and set t ← m ← n. (If the file being sampled has fewer than n records, it will of course be necessary to abort the algorithm and report failure. During this algorithm, indices I[1], ..., I[n] point to the records in the current sample; m is the size of the reservoir; and t is the number of input records dealt with so far.)
R2. [End of file?] If there are no more records to be input, go to step R6.
R3. [Generate and test.] Increase t by 1, then generate a random integer M between 1 and t (inclusive). If M > n, go to R5.
R4. [Add to reservoir.] Copy the next record of the input file to the reservoir, increase m by 1, and set I[M] ← m. (The record previously pointed to by I[M] is being replaced in the sample by the new record.) Go back to R2.
R5. [Skip.] Skip over the next record of the input file (do not include it in the reservoir), and return to step R2.
R6. [Second pass.] Sort the I table entries so that I[1] < ... < I[n]; then go through the reservoir, copying the records with these indices into the output file that is to hold the final sample.
算法 R 的伪代码如下所示:
for j= 1 to n
Reservoir[j]= File.GetNext()
I[j]= j
t=n // number of input records so far
m=n // size of the reservoir
while not File.EOF()
x= File.GetNext()
t++
M= Random(1..t)
if (M<=n)
m++
Reservoir[m]= x
I[M]= m
Sort(I[1..n])
for j= 1 to n
Output[j]= Reservoir[I[j]]
关于algorithm - Knuth 的水库采样伪代码中可能存在错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/17638962/
谁能解释一下 Man Or Boy Test返回值 -67? 我徒劳地尝试写下结果,或者用调试器跟踪它。任何帮助将不胜感激。 可以找到不同实现的列表 here . 最佳答案 This is a nic
我一直在想弄清楚唐纳德·克努斯 (Donald Knuth) 的 WEB是的,但真的很矛盾。我从网页上了解到它类似于 doxygen,但我阅读的所有资料都坚持认为它是一种编程语言。但是,它看起来不像我
这是一个例子: [00] 2009 的二进制形式... [05]哪个字母... [10] 四位数量——半字节或十六进制数字... [15] 千字节... [M13]如果x是任意0和1的字符串... [
我附上了一段代码,它根据 cout 语句给出了奇怪的输出。该程序主要计算 Knuth 的排列。 输入是说:run1代码运行良好,第一次通过:调用跟踪将是: r un1 你的 n1 努尔 1 1nur
算法是这样的: def online_variance(data): n = 0 mean = 0 M2 = 0 for x in data: n =
下面是 Knuth 的水库采样伪代码(如何从一组 k 数字中选择 n 数字,确保每个数字都具有相同的概率)。 初始化:一个大小为:k的水库. for i = k+1 to N M = rand
我尝试通过 A = 2654435769 的位移位来实现 Knuth 的乘法算法和 2^p 个元素的散列大小但是非移位和移位算法给出不同的结果 我是如何尝试实现这两个算法的: template
我正在学习什么是 Knuth 优化。 相关信息可通过here查询 Knuth 优化基本上有两个假设。 一个是四边形不等式,另一个是单调性 我完全可以理解什么是四边形不等式。但是,由于没有例子解释Mon
我正在实现 D. E. Knuth 的计算机编程艺术第 2 卷第 4.3.2 节的算法 D。 在步骤 D3 中,我应该计算 q = floor(u[j+n]*BASE+u[j+n-1] / v[n-1
这是 Knuth 乘法哈希的正确实现吗。 int hash(int v) { v *= 2654435761; return v >> 32; } 乘法溢出会影响算法吗? 如何提高该方
我正在查看 Don Knuth 教授的一些代码,这些代码是用 CWEB 编写并转换为 C 语言的。具体示例是 dlx1.w,可从 Knuth's website 获取。 在某个阶段,struct nd
通读this question和 this blog post让我更多地思考类型代数,特别是如何滥用它。 基本上, 1) 我们可以将 Either A B 类型视为加法:A+B 2) 我们可以将有序对
我正在尝试编译 Donald Knuth 的程序之一 http://www-cs-faculty.stanford.edu/~uno/programs/grayspan.w 。 我使用的是 Ubunt
我正在考虑是否可以消除 Knuth 内存堆上的外部碎片?在尝试解决这个问题之前,我不确定我们是否可以在堆上移动 block 。如果我们可以移动 block ,那么我相信解决外部碎片是微不足道的。 我对
关闭。这个问题需要debugging details .它目前不接受答案。 编辑问题以包含 desired behavior, a specific problem or error, and th
我正在尝试实现一个除以两个大精度数字的程序(我将它们作为字符串)。来自 Stack Overflow 上其他问题的人建议实现 Donald Knuth 的 The Art of Computer Pr
我已经从 D.Knuth 的 website 下载了DLX算法。在 D.Knuth 概述问题的第一部分中,将列分隔为“主要”列和其他列。这些“主要”列是哪些?提前致谢。 最佳答案 这是对 Exact
最近,我阅读了一些有关 Ackermann 函数和 Knuth 的向上箭头表示法的内容。我知道该符号用于表示变化很大的数字。但是,我找不到这种表示法的任何实际用途——该表示法应用于某些算法或程序。那么
我正在实现 Knuth shuffle对于我正在处理的 C++ 项目。我试图从我的洗牌中获得最公正的结果(而且我不是(伪)随机数生成方面的专家)。我只是想确保这是最公正的洗牌实现。 draw_t 是字
我刚刚开始阅读 Knuth 的《编程艺术》第一卷,并阅读了第 4 页上他描述欧几里得算法的部分。他阐述了求两个数的最大公约数的步骤。您可以在这里阅读更多相关信息 https://en.wikipedi
我是一名优秀的程序员,十分优秀!