gpt4 book ai didi

python - 使用 numexpr 的欧几里德范数

转载 作者:太空宇宙 更新时间:2023-11-04 10:42:33 26 4
gpt4 key购买 nike

我需要使用 numexpr 重写这段代码,它正在计算矩阵数据 [行 x 列] 和向量 [1 x 列] 的欧几里德范数矩阵。

d = ((data-vec)**2).sum(axis=1)

如何实现?或许还有另一种更快的方法?

我使用 hdf5 并从中读取数据矩阵的问题。例如,此代码给出错误:对象未对齐。

#naive numpy solution, can be parallel?
def test_bruteforce_knn():
h5f = tables.open_file(fileName)

t0= time.time()
d = np.empty((rows*batches,))
for i in range(batches):
d[i*rows:(i+1)*rows] = ((h5f.root.carray[i*rows:(i+1)*rows]-vec)**2).sum(axis=1)
print (time.time()-t0)
ndx = d.argsort()
print ndx[:k]

h5f.close()

#using some tricks (don't work error: objects are not aligned )
def test_bruteforce_knn():
h5f = tables.open_file(fileName)

t0= time.time()
d = np.empty((rows*batches,))
for i in range(batches):
d[i*rows:(i+1)*rows] = (np.einsum('ij,ij->i', h5f.root.carray[i*rows:(i+1)*rows],
h5f.root.carray[i*rows:(i+1)*rows])
+ np.dot(vec, vec)
-2 * np.dot(h5f.root.carray[i*rows:(i+1)*rows], vec))
print (time.time()-t0)
ndx = d.argsort()
print ndx[:k]

h5f.close()

Using numexpr: 似乎 numexpr 不理解 h5f.root.carray[i*rows:(i+1)*rows] 它必须重新分配?

import numexpr as ne

def test_bruteforce_knn():
h5f = tables.open_file(fileName)

t0= time.time()
d = np.empty((rows*batches,))
for i in range(batches):
ne.evaluate("sum((h5f.root.carray[i*rows:(i+1)*rows] - vec) ** 2, axis=1)")
print (time.time()-t0)
ndx = d.argsort()
print ndx[:k]

h5f.close()

最佳答案

有一种仅使用 NumPy 的潜在快速方法(对于非常大的数组),它在 scikit-learn 中使用:

def squared_row_norms(X):
# From http://stackoverflow.com/q/19094441/166749
return np.einsum('ij,ij->i', X, X)

def squared_euclidean_distances(data, vec):
data2 = squared_row_norms(data)
vec2 = squared_row_norms(vec)
d = np.dot(data, vec.T).ravel()
d *= -2
d += data2
d += vec2
return d

这是基于 (x - y)² = x² + y² - 2xy 这一事实,即使对于向量也是如此。

测试:

>>> data = np.random.randn(10, 40)
>>> vec = np.random.randn(1, 40)
>>> ((data - vec) ** 2).sum(axis=1)
array([ 96.75712686, 69.45894306, 100.71998244, 80.97797154,
84.8832107 , 82.28910021, 67.48309433, 81.94813371,
64.68162331, 77.43265692])
>>> squared_euclidean_distances(data, vec)
array([ 96.75712686, 69.45894306, 100.71998244, 80.97797154,
84.8832107 , 82.28910021, 67.48309433, 81.94813371,
64.68162331, 77.43265692])
>>> from sklearn.metrics.pairwise import euclidean_distances
>>> euclidean_distances(data, vec, squared=True).ravel()
array([ 96.75712686, 69.45894306, 100.71998244, 80.97797154,
84.8832107 , 82.28910021, 67.48309433, 81.94813371,
64.68162331, 77.43265692])

简介:

>>> data = np.random.randn(1000, 40)
>>> vec = np.random.randn(1, 40)
>>> %timeit ((data - vec)**2).sum(axis=1)
10000 loops, best of 3: 114 us per loop
>>> %timeit squared_euclidean_distances(data, vec)
10000 loops, best of 3: 52.5 us per loop

使用 numexpr 也是可能的,但它似乎并没有为 1000 点提供任何加速(并且在 10000 时,它也好不了多少):

>>> %timeit ne.evaluate("sum((data - vec) ** 2, axis=1)")
10000 loops, best of 3: 142 us per loop

关于python - 使用 numexpr 的欧几里德范数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/19653951/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com