gpt4 book ai didi

python - numpy 'isin' 性能改进

转载 作者:太空宇宙 更新时间:2023-11-04 08:30:55 24 4
gpt4 key购买 nike

我有一个包含 383milj 行的矩阵,我需要根据值列表 (index_to_remove) 过滤此矩阵。此功能在 1 次迭代期间执行多次。是否有更快的替代方案:

def remove_from_result(matrix, index_to_remove, inv=True):
return matrix[np.isin(matrix, index_to_remove, invert=inv)]

最佳答案

更快的实现

这是@Matt Messersmith 使用集合作为列表理解解决方案的编译版本。它基本上是较慢的 np.isin 方法的替代品。我在 index_to_remove 是标量值的情况下遇到了一些问题,并为这种情况实现了一个单独的版本。

代码

import numpy as np
import numba as nb

@nb.njit(parallel=True)
def in1d_vec_nb(matrix, index_to_remove):
#matrix and index_to_remove have to be numpy arrays
#if index_to_remove is a list with different dtypes this
#function will fail

out=np.empty(matrix.shape[0],dtype=nb.boolean)
index_to_remove_set=set(index_to_remove)

for i in nb.prange(matrix.shape[0]):
if matrix[i] in index_to_remove_set:
out[i]=False
else:
out[i]=True

return out

@nb.njit(parallel=True)
def in1d_scal_nb(matrix, index_to_remove):
#matrix and index_to_remove have to be numpy arrays
#if index_to_remove is a list with different dtypes this
#function will fail

out=np.empty(matrix.shape[0],dtype=nb.boolean)
for i in nb.prange(matrix.shape[0]):
if (matrix[i] == index_to_remove):
out[i]=False
else:
out[i]=True

return out


def isin_nb(matrix_in, index_to_remove):
#both matrix_in and index_to_remove have to be a np.ndarray
#even if index_to_remove is actually a single number
shape=matrix_in.shape
if index_to_remove.shape==():
res=in1d_scal_nb(matrix_in.reshape(-1),index_to_remove.take(0))
else:
res=in1d_vec_nb(matrix_in.reshape(-1),index_to_remove)

return res.reshape(shape)

示例

data = np.array([[80,1,12],[160,2,12],[240,3,12],[80,4,11]])
test_elts= np.array((80))

data[isin_nb(data[:,0],test_elts),:]

时间

test_elts = np.arange(12345)
data=np.arange(1000*1000)

#The first call has compilation overhead of about 300ms
#which is not included in the timings
#remove_from_result: 52ms
#isin_nb: 1.59ms

关于python - numpy 'isin' 性能改进,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53046473/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com