gpt4 book ai didi

python - 测试 Numpy 数组是否包含给定行

转载 作者:行者123 更新时间:2023-11-28 19:12:30 26 4
gpt4 key购买 nike

是否有一种 Pythonic 且有效的方法来检查 Numpy 数组是否至少包含给定行的一个实例?我所说的“高效”是指它在找到第一个匹配行时终止,而不是遍历整个数组,即使已经找到结果也是如此。

对于 Python 数组,这可以通过 if row in array: 非常干净地完成,但这并不像我对 Numpy 数组所期望的那样有效,如下图所示。

使用 Python 数组:

>>> a = [[1,2],[10,20],[100,200]]
>>> [1,2] in a
True
>>> [1,20] in a
False

但 Numpy 数组给出了不同且看起来很奇怪的结果。 (ndarray__contains__ 方法似乎没有记录。)

>>> a = np.array([[1,2],[10,20],[100,200]])
>>> np.array([1,2]) in a
True
>>> np.array([1,20]) in a
True
>>> np.array([1,42]) in a
True
>>> np.array([42,1]) in a
False

最佳答案

你可以使用.tolist()

>>> a = np.array([[1,2],[10,20],[100,200]])
>>> [1,2] in a.tolist()
True
>>> [1,20] in a.tolist()
False
>>> [1,20] in a.tolist()
False
>>> [1,42] in a.tolist()
False
>>> [42,1] in a.tolist()
False

或者使用 View :

>>> any((a[:]==[1,2]).all(1))
True
>>> any((a[:]==[1,20]).all(1))
False

或者通过 numpy 列表生成(可能非常慢):

any(([1,2] == x).all() for x in a)     # stops on first occurrence 

或者使用 numpy 逻辑函数:

any(np.equal(a,[1,2]).all(1))

如果你计时这些:

import numpy as np
import time

n=300000
a=np.arange(n*3).reshape(n,3)
b=a.tolist()

t1,t2,t3=a[n//100][0],a[n//2][0],a[-10][0]

tests=[ ('early hit',[t1, t1+1, t1+2]),
('middle hit',[t2,t2+1,t2+2]),
('late hit', [t3,t3+1,t3+2]),
('miss',[0,2,0])]

fmt='\t{:20}{:.5f} seconds and is {}'

for test, tgt in tests:
print('\n{}: {} in {:,} elements:'.format(test,tgt,n))

name='view'
t1=time.time()
result=(a[...]==tgt).all(1).any()
t2=time.time()
print(fmt.format(name,t2-t1,result))

name='python list'
t1=time.time()
result = True if tgt in b else False
t2=time.time()
print(fmt.format(name,t2-t1,result))

name='gen over numpy'
t1=time.time()
result=any((tgt == x).all() for x in a)
t2=time.time()
print(fmt.format(name,t2-t1,result))

name='logic equal'
t1=time.time()
np.equal(a,tgt).all(1).any()
t2=time.time()
print(fmt.format(name,t2-t1,result))

你可以看到命中或未命中,numpy例程搜索数组的速度是一样的。 Python in 运算符可能对于早期命中要快得多,如果您必须一直遍历数组,生成器就是个坏消息。

以下是 300,000 x 3 元素数组的结果:

early hit: [9000, 9001, 9002] in 300,000 elements:
view 0.01002 seconds and is True
python list 0.00305 seconds and is True
gen over numpy 0.06470 seconds and is True
logic equal 0.00909 seconds and is True

middle hit: [450000, 450001, 450002] in 300,000 elements:
view 0.00915 seconds and is True
python list 0.15458 seconds and is True
gen over numpy 3.24386 seconds and is True
logic equal 0.00937 seconds and is True

late hit: [899970, 899971, 899972] in 300,000 elements:
view 0.00936 seconds and is True
python list 0.30604 seconds and is True
gen over numpy 6.47660 seconds and is True
logic equal 0.00965 seconds and is True

miss: [0, 2, 0] in 300,000 elements:
view 0.00936 seconds and is False
python list 0.01287 seconds and is False
gen over numpy 6.49190 seconds and is False
logic equal 0.00965 seconds and is False

对于 3,000,000 x 3 阵列:

early hit: [90000, 90001, 90002] in 3,000,000 elements:
view 0.10128 seconds and is True
python list 0.02982 seconds and is True
gen over numpy 0.66057 seconds and is True
logic equal 0.09128 seconds and is True

middle hit: [4500000, 4500001, 4500002] in 3,000,000 elements:
view 0.09331 seconds and is True
python list 1.48180 seconds and is True
gen over numpy 32.69874 seconds and is True
logic equal 0.09438 seconds and is True

late hit: [8999970, 8999971, 8999972] in 3,000,000 elements:
view 0.09868 seconds and is True
python list 3.01236 seconds and is True
gen over numpy 65.15087 seconds and is True
logic equal 0.09591 seconds and is True

miss: [0, 2, 0] in 3,000,000 elements:
view 0.09588 seconds and is False
python list 0.12904 seconds and is False
gen over numpy 64.46789 seconds and is False
logic equal 0.09671 seconds and is False

这似乎表明 np.equal 是最快的纯 numpy 方法...

关于python - 测试 Numpy 数组是否包含给定行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38303381/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com