gpt4 book ai didi

python - 快速 Python/Numpy 频率严重性分布模拟

转载 作者:太空宇宙 更新时间:2023-11-03 14:59:09 25 4
gpt4 key购买 nike

我正在寻找模拟经典频率严重性分布的方法,例如:X = sum(i = 1..N, Y_i),其中 N 例如是泊松分布,Y 是对数正态。

简单朴素的 numpy 脚本是:

import numpy as np
SIM_NUM = 3

X = []
for _i in range(SIM_NUM):
nr_claims = np.random.poisson(1)
temp = []
for _j in range(nr_claims):
temp.append(np.random.lognormal(0, 1))
X.append(sum(temp))

现在我尝试对其进行矢量化以提高性能。更好一点的是:

N = np.random.poisson(1, SIM_NUM)
X = []
for n in N:
X.append(sum(np.random.lognormal(0, 1, n)))

我的问题是我是否仍然可以矢量化第二个循环?例如通过模拟所有损失:

N = np.random.poisson(1, SIM_NUM)
# print(N) would lead to something like [1 3 0]
losses = np.random.lognormal(0,1, sum(N))
# print(N) would lead to something like
#[ 0.56750244 0.84161871 0.41567216 1.04311742]

# X should now be [ 0.56750244, 0.84161871 + 0.41567216 + 1.04311742, 0]

我的想法是“智能切片”或“A = [[1, 0, 0, 0]],[0,1,1,1],[0,0,0,0] 的矩阵乘法]. 但我无法从这些想法中做出一些聪明的东西。

我正在寻找可能最快的 X 计算。

最佳答案

我们可以使用np.bincount ,这对于此类基于间隔/ID 的求和操作非常有效,特别是在处理 1D 数组时。实现看起来像这样 -

# Generate all poisson distribution values in one go
pv = np.random.poisson(1,SIM_NUM)

# Use poisson values to get count of total for random lognormal needed.
# Generate all those random numbers again in vectorized way
rand_arr = np.random.lognormal(0, 1, pv.sum())

# Finally create IDs using pv as extents for use with bincount to do
# ID based and thus effectively interval-based summing
out = np.bincount(np.arange(pv.size).repeat(pv),rand_arr,minlength=SIM_NUM)

运行时测试-

函数定义:

def original_app1(SIM_NUM):
X = []
for _i in range(SIM_NUM):
nr_claims = np.random.poisson(1)
temp = []
for _j in range(nr_claims):
temp.append(np.random.lognormal(0, 1))
X.append(sum(temp))
return X

def original_app2(SIM_NUM):
N = np.random.poisson(1, SIM_NUM)
X = []
for n in N:
X.append(sum(np.random.lognormal(0, 1, n)))
return X

def vectorized_app1(SIM_NUM):
pv = np.random.poisson(1,SIM_NUM)
r = np.random.lognormal(0, 1,pv.sum())
return np.bincount(np.arange(pv.size).repeat(pv),r,minlength=SIM_NUM)

大型数据集的计时:

In [199]: SIM_NUM = 1000

In [200]: %timeit original_app1(SIM_NUM)
100 loops, best of 3: 2.6 ms per loop

In [201]: %timeit original_app2(SIM_NUM)
100 loops, best of 3: 6.65 ms per loop

In [202]: %timeit vectorized_app1(SIM_NUM)
1000 loops, best of 3: 252 µs per loop

In [203]: SIM_NUM = 10000

In [204]: %timeit original_app1(SIM_NUM)
10 loops, best of 3: 26.1 ms per loop

In [205]: %timeit original_app2(SIM_NUM)
10 loops, best of 3: 77.5 ms per loop

In [206]: %timeit vectorized_app1(SIM_NUM)
100 loops, best of 3: 2.46 ms per loop

因此,我们正在寻找一些 10x+ 加速。

关于python - 快速 Python/Numpy 频率严重性分布模拟,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39860431/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com