gpt4 book ai didi

tensorflow - 如何在Tensorflow中通过自定义概率分布进行采样?

转载 作者:行者123 更新时间:2023-12-02 20:26:36 37 4
gpt4 key购买 nike

我有一个向量,例如 N 个元素的 V = [10, 30, 20, 50] 和一个概率向量 P = [.2, .3, .1, . 4]。在 tensorflow 中,如何从 V 中随机采样遵守给定概率分布 P 的 K 个元素?我希望通过替换来完成采样。

最佳答案

tf.nn.fixed_unigram_candidate_sampler或多或少做你想做的事。问题是,它只能采用 int32 参数作为 unigrams 参数(概率分布),因为它是为大量多类处理(例如语言处理)而设计的。您可以将概率分布中的数字相乘以获得整数,但精度有限。

将所需的样本数量放入 num_samples 中,并将概率权重放入 unigrams 中(必须为 int32。)参数 true_classes 必须为填充与 num_true 相同数量的元素,但在其他方面无关,因为您将获取索引(然后使用它们来提取样本)。unique 可以是根据需要更改为 True。

这是经过测试的代码:

import tensorflow as tf
import numpy as np
sess = tf.Session()

V = tf.constant( np.array( [[ 10, 30, 20, 50 ]]), dtype=tf.int64)

sampled_ids, true_expected_count, sampled_expected_count = tf.nn.fixed_unigram_candidate_sampler(
true_classes = V,
num_true = 4,
num_sampled = 50,
unique = False,
range_max = 4,
unigrams = [ 20, 30, 10, 40 ] # this is P, times 100
)
sample = tf.gather( V[ 0 ], sampled_ids )
x = sess.run( sample )
print( x )

输出:

[50 20 10 30 30 30 10 30 20 50 50 50 10 50 10 30 50 50 30 30 50 10 20 30 50 50 50 50 30 50 50 30 50 50 50 50 50 50 50 10 50 30 50 10 50 50 10 30 50 50]

如果您确实想使用 float32 概率值,那么您必须从多个部分创建采样器(不存在任何操作),如下所示(经过测试的代码):

import tensorflow as tf
import numpy as np
sess = tf.Session()

k = 50 # number of samples you want
V = tf.constant( [ 10, 30, 20, 50 ], dtype = tf.float32 ) # values
P = tf.constant( [ 0.2, 0.3, 0.1, 0.4 ], dtype = tf.float32 ) # prob dist

cum_dist = tf.cumsum( P ) # create cumulative probability distribution

# get random values between 0 and the max of cum_dist
# we'll determine where it is in the cumulative distribution
rand_unif = tf.random_uniform( shape=( k, ), minval = 0.0, maxval = tf.reduce_max( cum_dist ), dtype = tf.float32 )

# create boolean to signal where the random number is greater than the cum_dist
# take advantage of broadcasting to create Cartesian product
greater = tf.expand_dims( rand_unif, axis = -1 ) > tf.expand_dims( cum_dist, axis = 0 )

# we get the indices by counting how many are greater in any given row
idxs = tf.reduce_sum( tf.cast( greater, dtype = tf.int64 ), 1 )

# then just gather the sample from V by the indices
sample = tf.gather( V, idxs )

# run, output
print( sess.run( sample ) )

输出:

[20. 10. 50. 50. 20. 30. 10. 20. 30. 50. 20. 50. 30. 50. 30. 50. 50. 50. 50. 50. 50. 30. 20. 20. 20. 10. 50. 30. 30. 10. 50. 50. 50. 20. 30. 50. 30. 10. 50. 20. 30. 50. 30. 10. 10. 50. 50. 20. 50. 30.]

关于tensorflow - 如何在Tensorflow中通过自定义概率分布进行采样?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49713210/

37 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com