gpt4 book ai didi

python - 找到第一个 np.nan 值位置的最有效方法是什么?

转载 作者:太空狗 更新时间:2023-10-29 17:24:44 25 4
gpt4 key购买 nike

考虑数组a

a = np.array([3, 3, np.nan, 3, 3, np.nan])

我能做到

np.isnan(a).argmax()

但这需要找到所有 np.nan 才能找到第一个。
有没有更有效的方法?


我一直在尝试弄清楚我是否可以将参数传递给 np.argpartition,这样 np.nan 就会排在最前面而不是最后。


关于 [dup] 的编辑。
这个问题之所以不同,有几个原因。

  1. 该问题和答案涉及值(value)观的平等。这是关于 isnan 的。
  2. 这些答案都遇到了我的答案所面临的同样问题。请注意,我提供了一个完全有效的答案,但强调了它的效率低下。我希望解决效率低下的问题。

关于第二个 [dup] 的编辑。

仍在解决平等问题,问题/答案已经陈旧,很可能已经过时。

最佳答案

numba.jit 可能也值得研究;没有它,矢量化版本可能会在大多数情况下击败直接的纯 Python 搜索,但编译代码后,普通搜索将领先,至少在我的测试中是这样:

In [63]: a = np.array([np.nan if i % 10000 == 9999 else 3 for i in range(100000)])

In [70]: %paste
import numba

def naive(a):
for i in range(len(a)):
if np.isnan(a[i]):
return i

def short(a):
return np.isnan(a).argmax()

@numba.jit
def naive_jit(a):
for i in range(len(a)):
if np.isnan(a[i]):
return i

@numba.jit
def short_jit(a):
return np.isnan(a).argmax()
## -- End pasted text --

In [71]: %timeit naive(a)
100 loops, best of 3: 7.22 ms per loop

In [72]: %timeit short(a)
The slowest run took 4.59 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 37.7 µs per loop

In [73]: %timeit naive_jit(a)
The slowest run took 6821.16 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 6.79 µs per loop

In [74]: %timeit short_jit(a)
The slowest run took 395.51 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 144 µs per loop

编辑:正如@hpaulj 在他们的回答中所指出的,numpy 实际上附带了一个优化的短路搜索,其性能与上面的 JITted 搜索相当:

In [26]: %paste
def plain(a):
return a.argmax()

@numba.jit
def plain_jit(a):
return a.argmax()
## -- End pasted text --

In [35]: %timeit naive(a)
100 loops, best of 3: 7.13 ms per loop

In [36]: %timeit plain(a)
The slowest run took 4.37 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 7.04 µs per loop

In [37]: %timeit naive_jit(a)
100000 loops, best of 3: 6.91 µs per loop

In [38]: %timeit plain_jit(a)
10000 loops, best of 3: 125 µs per loop

关于python - 找到第一个 np.nan 值位置的最有效方法是什么?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41320568/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com