gpt4 book ai didi

python - NumPy 中的外部产品 : Vectorizing six nested loops

转载 作者:太空狗 更新时间:2023-10-30 02:08:40 25 4
gpt4 key购买 nike

在一篇研究论文中,作者在两个 (3*3) 矩阵 AB 之间引入了一个外积,得到了 C :

C(i, j) = sum(k=1..3, l=1..3, m=1..3, n=1..3) eps(i,k,l)*eps(j,m,n)*A(k,m)*B(l,n)

其中 eps(a, b, c)Levi-Civita symbol .

我想知道如何在 Numpy 中向量化这样一个数学运算符,而不是天真地实现 6 个嵌套循环(for i, j, k, l, m, n)。

最佳答案

它看起来像是一个纯粹基于求和归约的问题,不需要在输入之间保持任何轴对齐。因此,我建议使用 np.tensordot 为张量提供基于矩阵乘法的解决方案。 .

因此,一个解决方案可以分三步实现 -

# Matrix-multiplication between first eps and A.
# Thus losing second axis from eps and first from A : k
parte1 = np.tensordot(eps,A,axes=((1),(0)))

# Matrix-multiplication between second eps and B.
# Thus losing third axis from eps and second from B : n
parte2 = np.tensordot(eps,B,axes=((2),(1)))

# Finally, we are left with two products : ilm & jml.
# We need to lose lm and ml from these inputs respectively to get ij.
# So, we need to lose last two dims from the products, but flipped .
out = np.tensordot(parte1,parte2,axes=((1,2),(2,1)))

运行时测试

方法-

def einsum_based1(eps, A, B):  # @unutbu's soln1
return np.einsum('ikl,jmn,km,ln->ij', eps, eps, A, B)

def einsum_based2(eps, A, B): # @unutbu's soln2
return np.einsum('ilm,jml->ij',
np.einsum('ikl,km->ilm', eps, A),
np.einsum('jmn,ln->jml', eps, B))

def tensordot_based(eps, A, B):
parte1 = np.tensordot(eps,A,axes=((1),(0)))
parte2 = np.tensordot(eps,B,axes=((2),(1)))
return np.tensordot(parte1,parte2,axes=((1,2),(2,1)))

时间 -

In [5]: # Setup inputs
...: N = 20
...: eps = np.random.rand(N,N,N)
...: A = np.random.rand(N,N)
...: B = np.random.rand(N,N)
...:

In [6]: %timeit einsum_based1(eps, A, B)
1 loops, best of 3: 773 ms per loop

In [7]: %timeit einsum_based2(eps, A, B)
1000 loops, best of 3: 972 µs per loop

In [8]: %timeit tensordot_based(eps, A, B)
1000 loops, best of 3: 214 µs per loop

更大的数据集-

In [12]: # Setup inputs
...: N = 100
...: eps = np.random.rand(N,N,N)
...: A = np.random.rand(N,N)
...: B = np.random.rand(N,N)
...:

In [13]: %timeit einsum_based2(eps, A, B)
1 loops, best of 3: 856 ms per loop

In [14]: %timeit tensordot_based(eps, A, B)
10 loops, best of 3: 49.2 ms per loop

关于python - NumPy 中的外部产品 : Vectorizing six nested loops,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41808217/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com