gpt4 book ai didi

python - 使用自定义 pdf 时 stats.rv_continuous 变慢

转载 作者:行者123 更新时间:2023-12-04 14:22:51 31 4
gpt4 key购买 nike

最终,我试图将两个 PDF 之间的 copula 可视化,这两个 PDF 是根据数据估计的(均通过 KDE)。假设,对于其中一个 KDE,我将离散的 x,y 数据排序在一个名为 data 的元组中。我需要生成具有此分布的随机变量,以便执行概率积分变换(并最终获得均匀分布)。我生成随机变量的方法如下:

import scipy.stats as st
from scipy import interpolate, integrate

pdf1 = interpolate.interp1d(data[0], data[1])

class pdf1_class(st.rv_continuous):
def _pdf(self,x):
return pdf1(x)

pdf1_rv = pdf1_class(a = data[0][0], b= data[0][-1], name = 'pdf1_class')

pdf1_samples = pdf1_rv.rvs(size=10000)

但是,这种方法非常慢。我还收到以下警告:

IntegrationWarning: The maximum number of subdivisions (50) has been achieved. If increasing the limit yields no improvement it is advised to analyze the integrand in order to determine the difficulties. If the position of a local difficulty can be determined (singularity, discontinuity) one will probably gain from splitting up the interval and calling the integrator on the subranges. Perhaps a special-purpose integrator should be used. warnings.warn(msg, IntegrationWarning)

IntegrationWarning: The occurrence of roundoff error is detected, which prevents the requested tolerance from being achieved. The error may be underestimated. warnings.warn(msg, IntegrationWarning)

有没有更好的方法来生成随机变量?

最佳答案

根据@unutbu 的建议,我实现了 _cdf_ppf,这使得 10000 个样本的计算成为瞬时的。为此,我在上面的代码中添加了以下内容:

discrete_cdf1 = integrate.cumtrapz(y=data[1], x = data[0])
cdf1 = interpolate.interp1d(data[0][1:], discrete_cdf1)
ppf1 = interpolate.interp1d(discerete_cdf1, data[0][:-1])

然后我将以下两个方法添加到 pdf1_class

def _cdf(self,x):
return cdf1(x)

def _ppf(self,x):
return ppf1(x)

关于python - 使用自定义 pdf 时 stats.rv_continuous 变慢,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51946345/

31 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com