gpt4 book ai didi

python - 这个python函数可以向量化吗?

转载 作者:太空狗 更新时间:2023-10-30 01:20:24 26 4
gpt4 key购买 nike

我一直在研究这个函数,它生成我正在开发的模拟代码所需的一些参数,但在提高其性能方面遇到了瓶颈。

分析代码表明这是主要的瓶颈,因此我可以对其进行任何改进,无论多么微小都会很棒。

我想尝试向量化此函数的一部分,但不确定是否可行。

主要的挑战是存储在我的数组 params 中的参数取决于 params 的索引。我看到的唯一直接的解决方案是使用 np.ndenumerate,但这似乎很慢。

是否可以对存储在数组中的值取决于它们的存储位置的这种类型的操作进行矢量化?或者创建一个只给我数组索引的元组的生成器会更聪明/更快吗?

import numpy as np
from scipy.sparse import linalg as LA

def get_params(num_bonds, energies):
"""
Returns the interaction parameters of different pairs of atoms.

Parameters
----------
num_bonds : ndarray, shape = (M, 20)
Sparse array containing the number of nearest neighbor bonds for
different pairs of atoms (denoted by their column) and next-
nearest neighbor bonds. Columns 0-9 contain nearest neighbors,
10-19 contain next-nearest neighbors

energies : ndarray, shape = (M, )
Energy vector corresponding to each atomic system stored in each
row of num_bonds.
"""

# -- Compute the bond energies
x = LA.lsqr(num_bonds, energies, show=False)[0]

params = np.zeros([4, 4, 4, 4, 4, 4, 4, 4, 4])

nn = {(0,0): x[0], (1,1): x[1], (2,2): x[2], (3,3): x[3], (0,1): x[4],
(1,0): x[4], (0,2): x[5], (2,0): x[5], (0,3): x[6], (3,0): x[6],
(1,2): x[7], (2,1): x[7], (1,3): x[8], (3,1): x[8], (2,3): x[9],
(3,2): x[9]}

nnn = {(0,0): x[10], (1,1): x[11], (2,2): x[12], (3,3): x[13], (0,1): x[14],
(1,0): x[14], (0,2): x[15], (2,0): x[15], (0,3): x[16], (3,0): x[16],
(1,2): x[17], (2,1): x[17], (1,3): x[18], (3,1): x[18], (2,3): x[19],
(3,2): x[19]}

"""
params contains the energy contribution of each site due to its
local environment. The shape is given by the number of possible atom
types and the number of sites in the lattice.
"""
for (i,j,k,l,m,jj,kk,ll,mm), val in np.ndenumerate(params):

params[i,j,k,l,m,jj,kk,ll,mm] = nn[(i,j)] + nn[(i,k)] + nn[(i,l)] + \
nn[(i,m)] + nnn[(i,jj)] + \
nnn[(i,kk)] + nnn[(i,ll)] + nnn[(i,mm)]

return np.ascontiguousarray(params)

最佳答案

这是使用 broadcasted 的矢量化方法总结-

# Gather the elements sorted by the keys in (row,col) order of a dense 
# 2D array for both nn and nnn
sidx0 = np.ravel_multi_index(np.array(nn.keys()).T,(4,4)).argsort()
a0 = np.array(nn.values())[sidx0].reshape(4,4)

sidx1 = np.ravel_multi_index(np.array(nnn.keys()).T,(4,4)).argsort()
a1 = np.array(nnn.values())[sidx1].reshape(4,4)

# Perform the summations keep the first axis aligned for nn and nnn parts
parte0 = a0[:,:,None,None,None] + a0[:,None,:,None,None] + \
a0[:,None,None,:,None] + a0[:,None,None,None,:]

parte1 = a1[:,:,None,None,None] + a1[:,None,:,None,None] + \
a1[:,None,None,:,None] + a1[:,None,None,None,:]

# Finally add up sums from nn and nnn for final output
out = parte0[...,None,None,None,None] + parte1[:,None,None,None,None]

运行时测试

函数定义-

def vectorized_approach(nn,nnn):
sidx0 = np.ravel_multi_index(np.array(nn.keys()).T,(4,4)).argsort()
a0 = np.array(nn.values())[sidx0].reshape(4,4)
sidx1 = np.ravel_multi_index(np.array(nnn.keys()).T,(4,4)).argsort()
a1 = np.array(nnn.values())[sidx1].reshape(4,4)
parte0 = a0[:,:,None,None,None] + a0[:,None,:,None,None] + \
a0[:,None,None,:,None] + a0[:,None,None,None,:]
parte1 = a1[:,:,None,None,None] + a1[:,None,:,None,None] + \
a1[:,None,None,:,None] + a1[:,None,None,None,:]
return parte0[...,None,None,None,None] + parte1[:,None,None,None,None]

def original_approach(nn,nnn):
params = np.zeros([4, 4, 4, 4, 4, 4, 4, 4, 4])
for (i,j,k,l,m,jj,kk,ll,mm), val in np.ndenumerate(params):
params[i,j,k,l,m,jj,kk,ll,mm] = nn[(i,j)] + nn[(i,k)] + nn[(i,l)] + \
nn[(i,m)] + nnn[(i,jj)] + \
nnn[(i,kk)] + nnn[(i,ll)] + nnn[(i,mm)]
return params

设置输入-

# Setup inputs
x = np.random.rand(30)
nn = {(0,0): x[0], (1,1): x[1], (2,2): x[2], (3,3): x[3], (0,1): x[4],
(1,0): x[4], (0,2): x[5], (2,0): x[5], (0,3): x[6], (3,0): x[6],
(1,2): x[7], (2,1): x[7], (1,3): x[8], (3,1): x[8], (2,3): x[9],
(3,2): x[9]}

nnn = {(0,0): x[10], (1,1): x[11], (2,2): x[12], (3,3): x[13], (0,1): x[14],
(1,0): x[14], (0,2): x[15], (2,0): x[15], (0,3): x[16], (3,0): x[16],
(1,2): x[17], (2,1): x[17], (1,3): x[18], (3,1): x[18], (2,3): x[19],
(3,2): x[19]}

时间 -

In [98]: np.allclose(original_approach(nn,nnn),vectorized_approach(nn,nnn))
Out[98]: True

In [99]: %timeit original_approach(nn,nnn)
1 loops, best of 3: 884 ms per loop

In [100]: %timeit vectorized_approach(nn,nnn)
1000 loops, best of 3: 708 µs per loop

欢迎使用 1000x+ 加速!


对于具有此类外部产品的通用数量的系统,这是一个遍历这些维度的通用解决方案 -

m,n = a0.shape # size of output array along each axis
N = 4 # Order of system
out = a0.copy()
for i in range(1,N):
out = out[...,None] + a0.reshape((m,)+(1,)*i+(n,))

for i in range(N):
out = out[...,None] + a1.reshape((m,)+(1,)*(i+n)+(n,))

关于python - 这个python函数可以向量化吗?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40006169/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com