gpt4 book ai didi

python - 如何为每一行创建带有 numpy random.choice 的二维数组?

转载 作者:行者123 更新时间:2023-11-28 20:35:02 27 4
gpt4 key购买 nike

我正在尝试使用 numpy 随机选择创建一个二维数组(这是一个六列和很多行),每行的唯一值在 1 到 50 之间,而不是整个数组

np.sort(np.random.choice(np.arange(1,50),size=(100,6),replace=False))

但这会引发错误。

ValueError: Cannot take a larger sample than population when 'replace=False'

有没有可能用一个没有环路的衬垫来做这个

编辑

好的,我得到答案了。

这些是 jupyter %time cellmagic 的结果

#@James' solution
np.stack([np.random.choice(np.arange(1,50),size=6,replace=False) for i in range(1_000_000)])
Wall time: 25.1 s



#@Divakar's solution
np.random.rand(1_000_000, 50).argpartition(6,axis=1)[:,:6]+1
Wall time: 1.36 s



#@CoryKramer's solution
np.array([np.random.choice(np.arange(1, 50), size=6, replace=False) for _ in range(1_000_000)])
Wall time: 25.5 s

我在@Paul Panzer 的解决方案中更改了 np.empty 和 np.random.randint 的数据类型,因为它在我的电脑上不起作用。

3.6.0 |Anaconda custom (64-bit)| (default, Dec 23 2016, 11:57:41) [MSC v.1900 64 bit (AMD64)]

最快的是

def pp(n):
draw = np.empty((n, 6), dtype=np.int64)
# generating random numbers is expensive, so draw a large one and
# make six out of one
draw[:, 0] = np.random.randint(0, 50*49*48*47*46*45, (n,),dtype=np.uint64)
draw[:, 1:] = np.arange(50, 45, -1)
draw = np.floor_divide.accumulate(draw, axis=-1)
draw[:, :-1] -= draw[:, 1:] * np.arange(50, 45, -1)
# map the shorter ranges (:49, :48, :47) to the non-occupied
# positions; this amounts to incrementing for each number on the
# left that is not larger. the nasty bit: if due to incrementing
# new numbers on the left are "overtaken" then for them we also
# need to increment.
for i in range(1, 6):
coll = np.sum(draw[:, :i] <= draw[:, i, None], axis=-1)
collidx = np.flatnonzero(coll)
if collidx.size == 0:
continue
coll = coll[collidx]
tot = coll
while True:
draw[collidx, i] += coll
coll = np.sum(draw[collidx, :i] <= draw[collidx, i, None], axis=-1)
relidx = np.flatnonzero(coll > tot)
if relidx.size == 0:
break
coll, tot = coll[relidx]-tot[relidx], coll[relidx]
collidx = collidx[relidx]

return draw + 1

#@Paul Panzer' solution
pp(1_000_000)
Wall time: 557 ms

谢谢大家

最佳答案

这是一个矢量化方法,使用来自 hererand+argsort/argpartition 技巧-

np.random.rand(rows, 50).argpartition(6,axis=1)[:,:6]+1

sample 运行-

In [41]: rows = 10

In [42]: np.random.rand(rows, 50).argpartition(6,axis=1)[:,:6]+1
Out[42]:
array([[ 1, 9, 3, 26, 14, 44],
[32, 20, 27, 13, 25, 45],
[40, 12, 47, 16, 10, 29],
[ 6, 36, 32, 16, 18, 4],
[42, 46, 24, 9, 1, 31],
[15, 25, 47, 42, 34, 24],
[ 7, 16, 49, 31, 40, 20],
[28, 17, 47, 36, 8, 44],
[ 7, 42, 14, 4, 17, 35],
[39, 19, 37, 7, 8, 36]])

只是为了证明随机性-

In [56]: rows = 1000000

In [57]: out = np.random.rand(rows, 50).argpartition(6,axis=1)[:,:6]+1

In [58]: np.bincount(out.ravel())[1:]
Out[58]:
array([120048, 120026, 119942, 119838, 119885, 119669, 119965, 119491,
120280, 120108, 120293, 119399, 119917, 119974, 120195, 119796,
119887, 119505, 120235, 119857, 119499, 120560, 119891, 119693,
120081, 120369, 120011, 119714, 120218, 120581, 120111, 119867,
119791, 120265, 120457, 120048, 119813, 119702, 120266, 120445,
120016, 120190, 119576, 119737, 120153, 120215, 120144, 120196,
120218, 119863])

一百万行数据的时间-

In [43]: rows = 1000000

In [44]: %timeit np.random.rand(rows, 50).argpartition(6,axis=1)[:,:6]+1
1 loop, best of 3: 1.07 s per loop

关于python - 如何为每一行创建带有 numpy random.choice 的二维数组?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47675003/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com