gpt4 book ai didi

algorithm - 从元素具有权重的列表中选择 k 个随机元素

转载 作者:塔克拉玛干 更新时间:2023-11-03 02:12:10 24 4
gpt4 key购买 nike

没有任何权重(等概率)的选择描述得很漂亮here .

我想知道是否有办法将这种方法转换为加权方法。

我也对其他方法感兴趣。

更新:采样替换

最佳答案

如果抽样是有放回的,您可以使用这个算法(在此处用 Python 实现):

import random

items = [(10, "low"),
(100, "mid"),
(890, "large")]

def weighted_sample(items, n):
total = float(sum(w for w, v in items))
i = 0
w, v = items[0]
while n:
x = total * (1 - random.random() ** (1.0 / n))
total -= x
while x > w:
x -= w
i += 1
w, v = items[i]
w -= x
yield v
n -= 1

这是 O(n + m),其中 m 是项目数。

为什么这样做?它基于以下算法:

def n_random_numbers_decreasing(v, n):
"""Like reversed(sorted(v * random() for i in range(n))),
but faster because we avoid sorting."""
while n:
v *= random.random() ** (1.0 / n)
yield v
n -= 1

weighted_sample 函数就是将此算法与 items 列表的遍历相结合,以挑选出由这些随机数选择的项目。

这又是有效的,因为 n 个随机数 0..v 都恰好小于 z 的概率是 P = (z/v)n。求解 z,得到 z = vP1/n。用一个随机数代替 P 会选择具有正确分布的最大数;我们可以重复这个过程来选择所有其他数字。

如果采样没有放回,您可以将所有项目放入一个二叉堆中,其中每个节点缓存该子堆中所有项目的权重总和。构建堆的时间复杂度为 O(m)。从堆中选择一个随机项目,考虑到权重,是 O(log m)。删除该项目并更新缓存的总计也是 O(log m)。因此,您可以在 O(m + n log m) 时间内挑选 n 项。

(注意:这里的“权重”是指每次选择一个元素时,选择剩余的可能性与其权重成正比的概率。并不是说元素出现在输出中的可能性与其权重成正比。)

这是它的一个实现,有大量评论:

import random

class Node:
# Each node in the heap has a weight, value, and total weight.
# The total weight, self.tw, is self.w plus the weight of any children.
__slots__ = ['w', 'v', 'tw']
def __init__(self, w, v, tw):
self.w, self.v, self.tw = w, v, tw

def rws_heap(items):
# h is the heap. It's like a binary tree that lives in an array.
# It has a Node for each pair in `items`. h[1] is the root. Each
# other Node h[i] has a parent at h[i>>1]. Each node has up to 2
# children, h[i<<1] and h[(i<<1)+1]. To get this nice simple
# arithmetic, we have to leave h[0] vacant.
h = [None] # leave h[0] vacant
for w, v in items:
h.append(Node(w, v, w))
for i in range(len(h) - 1, 1, -1): # total up the tws
h[i>>1].tw += h[i].tw # add h[i]'s total to its parent
return h

def rws_heap_pop(h):
gas = h[1].tw * random.random() # start with a random amount of gas

i = 1 # start driving at the root
while gas >= h[i].w: # while we have enough gas to get past node i:
gas -= h[i].w # drive past node i
i <<= 1 # move to first child
if gas >= h[i].tw: # if we have enough gas:
gas -= h[i].tw # drive past first child and descendants
i += 1 # move to second child
w = h[i].w # out of gas! h[i] is the selected node.
v = h[i].v

h[i].w = 0 # make sure this node isn't chosen again
while i: # fix up total weights
h[i].tw -= w
i >>= 1
return v

def random_weighted_sample_no_replacement(items, n):
heap = rws_heap(items) # just make a heap...
for i in range(n):
yield rws_heap_pop(heap) # and pop n items off it.

关于algorithm - 从元素具有权重的列表中选择 k 个随机元素,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/2140787/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com