gpt4 book ai didi

python - numpy:累积多重计数

转载 作者:太空宇宙 更新时间:2023-11-03 14:51:32 25 4
gpt4 key购买 nike

我有一个排序的整数数组,可能有重复。我想计算连续相等的值,当值与前一个值不同时从零重新开始。这是使用简单的 python 循环实现的预期结果:

import numpy as np

def count_multiplicities(a):
r = np.zeros(a.shape, dtype=a.dtype)
for i in range(1, len(a)):
if a[i] == a[i-1]:
r[i] = r[i-1]+1
else:
r[i] = 0
return r

a = (np.random.rand(20)*5).astype(dtype=int)
a.sort()

print "given sorted array: ", a
print "multiplicity count: ", count_multiplicities(a)

输出:

given sorted array:  [0 0 0 0 0 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4]
multiplicity count: [0 1 2 3 4 0 1 2 0 1 2 3 0 1 2 3 0 1 2 3]

如何使用 numpy 以高效的方式获得相同的结果?数组很长,但重复次数很少(比如不超过十次)。

在我的特殊情况下,我还知道值从零开始,并且连续值之间的差异是 0 或 1(值之间没有间隙)。

最佳答案

这是一个cumsum基于向量化的方法 -

def count_multiplicities_cumsum_vectorized(a):      
out = np.ones(a.size,dtype=int)
idx = np.flatnonzero(a[1:] != a[:-1])+1
out[idx[0]] = -idx[0] + 1
out[0] = 0
out[idx[1:]] = idx[:-1] - idx[1:] + 1
np.cumsum(out, out=out)
return out

sample 运行-

In [58]: a
Out[58]: array([0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 4, 4, 4])

In [59]: count_multiplicities(a) # Original approach
Out[59]: array([0, 1, 2, 3, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 0, 1, 2])

In [60]: count_multiplicities_cumsum_vectorized(a)
Out[60]: array([0, 1, 2, 3, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 0, 1, 2])

运行时测试-

In [66]: a = (np.random.rand(200000)*1000).astype(dtype=int)
...: a.sort()
...:

In [67]: a
Out[67]: array([ 0, 0, 0, ..., 999, 999, 999])

In [68]: %timeit count_multiplicities(a)
10 loops, best of 3: 87.2 ms per loop

In [69]: %timeit count_multiplicities_cumsum_vectorized(a)
1000 loops, best of 3: 739 µs per loop

Related post .

关于python - numpy:累积多重计数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45320903/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com