gpt4 book ai didi

python - python中的加权随机样本

转载 作者:太空狗 更新时间:2023-10-29 21:43:54 24 4
gpt4 key购买 nike

我正在寻找一个函数 weighted_sample 的合理定义,它不只为给定权重列表返回一个随机索引(类似于

def weighted_choice(weights, random=random):
""" Given a list of weights [w_0, w_1, ..., w_n-1],
return an index i in range(n) with probability proportional to w_i. """
rnd = random.random() * sum(weights)
for i, w in enumerate(weights):
if w<0:
raise ValueError("Negative weight encountered.")
rnd -= w
if rnd < 0:
return i
raise ValueError("Sum of weights is not positive")

给出一个具有常量权重的分类分布)但是一个 k 的随机样本,没有替换,就像 random.samplerandom.choice 相比的行为。

正如weighted_choice可以写成

lambda weights: random.choice([val for val, cnt in enumerate(weights)
for i in range(cnt)])

weighted_sample 可以写成

lambda weights, k: random.sample([val for val, cnt in enumerate(weights)
for i in range(cnt)], k)

但我想要一个不需要我将权重解散到(可能是巨大的)列表中的解决方案。

编辑:如果有任何不错的算法可以返回直方图/频率列表(格式与参数 weights 相同)而不是索引序列,那也非常好有用。

最佳答案

从您的代码:..

weight_sample_indexes = lambda weights, k: random.sample([val 
for val, cnt in enumerate(weights) for i in range(cnt)], k)

.. 我假设权重是正整数,“没有替换”是指没有替换解开的序列。

这是一个基于 random.sample 和 O(log n) __getitem__ 的解决方案:

import bisect
import random
from collections import Counter, Sequence

def weighted_sample(population, weights, k):
return random.sample(WeightedPopulation(population, weights), k)

class WeightedPopulation(Sequence):
def __init__(self, population, weights):
assert len(population) == len(weights) > 0
self.population = population
self.cumweights = []
cumsum = 0 # compute cumulative weight
for w in weights:
cumsum += w
self.cumweights.append(cumsum)
def __len__(self):
return self.cumweights[-1]
def __getitem__(self, i):
if not 0 <= i < len(self):
raise IndexError(i)
return self.population[bisect.bisect(self.cumweights, i)]

例子

total = Counter()
for _ in range(1000):
sample = weighted_sample("abc", [1,10,2], 5)
total.update(sample)
print(sample)
print("Frequences %s" % (dict(Counter(sample)),))

# Check that values are sane
print("Total " + ', '.join("%s: %.0f" % (val, count * 1.0 / min(total.values()))
for val, count in total.most_common()))

输出

['b', 'b', 'b', 'c', 'c']
Frequences {'c': 2, 'b': 3}
Total b: 10, c: 2, a: 1

关于python - python中的加权随机样本,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/13047806/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com