gpt4 book ai didi

multithreading - 使用 MKL BLAS 时,scipy 是否支持稀疏矩阵乘法的多线程?

转载 作者:行者123 更新时间:2023-12-04 14:14:22 26 4
gpt4 key购买 nike

根据 MKL BLAS 文档
“所有矩阵-矩阵操作(第 3 级)都针对密集和稀疏 BLAS 进行了线程化。”
http://software.intel.com/en-us/articles/parallelism-in-the-intel-math-kernel-library

我用 MKL BLAS 构建了 Scipy。使用下面的测试代码,我看到了密集但不是稀疏矩阵乘法的预期多线程加速。 Scipy 是否有任何更改以启用多线程稀疏操作?

# test dense matrix multiplication
from numpy import *
import time
x = random.random((10000,10000))
t1 = time.time()
foo = dot(x.T, x)
print time.time() - t1

# test sparse matrix multiplication
from scipy import sparse
x = sparse.rand(10000,10000)
t1 = time.time()
foo = dot(x.T, x)
print time.time() - t1

最佳答案

据我所知,答案是否定的。但是,您可以围绕 MKL 稀疏乘法例程构建自己的包装器。您询问了两个稀疏矩阵相乘的问题。下面是一些我用来将一个稀疏矩阵乘以一个密集向量的包装代码,所以它应该不难适应(查看 mkl_cspblas_dcsrgemm 的英特尔 MKL 引用)。另外,请注意 scipy 数组的存储方式:默认为 coo,但 csr(或 csc)可能是更好的选择。我选择了 csr,但 MKL 支持大多数类型(只需调用适当的例程)。

据我所知,scipy 的默认值和 MKL 都是多线程的。通过更改 OMP_NUM_THREADS我可以看到性能上的差异。

要使用下面的功能,如果您有最新版本的 MKL,只需确保您有 LD_LIBRARY_PATHS设置为包括相关的 MKL 目录。对于旧版本,您需要构建一些特定的库。我的信息来自 IntelMKL in python

def SpMV_viaMKL( A, x ):
"""
Wrapper to Intel's SpMV
(Sparse Matrix-Vector multiply)
For medium-sized matrices, this is 4x faster
than scipy's default implementation
Stephen Becker, April 24 2014
stephen.beckr@gmail.com
"""

import numpy as np
import scipy.sparse as sparse
from ctypes import POINTER,c_void_p,c_int,c_char,c_double,byref,cdll
mkl = cdll.LoadLibrary("libmkl_rt.so")

SpMV = mkl.mkl_cspblas_dcsrgemv
# Dissecting the "cspblas_dcsrgemv" name:
# "c" - for "c-blas" like interface (as opposed to fortran)
# Also means expects sparse arrays to use 0-based indexing, which python does
# "sp" for sparse
# "d" for double-precision
# "csr" for compressed row format
# "ge" for "general", e.g., the matrix has no special structure such as symmetry
# "mv" for "matrix-vector" multiply

if not sparse.isspmatrix_csr(A):
raise Exception("Matrix must be in csr format")
(m,n) = A.shape

# The data of the matrix
data = A.data.ctypes.data_as(POINTER(c_double))
indptr = A.indptr.ctypes.data_as(POINTER(c_int))
indices = A.indices.ctypes.data_as(POINTER(c_int))

# Allocate output, using same conventions as input
nVectors = 1
if x.ndim is 1:
y = np.empty(m,dtype=np.double,order='F')
if x.size != n:
raise Exception("x must have n entries. x.size is %d, n is %d" % (x.size,n))
elif x.shape[1] is 1:
y = np.empty((m,1),dtype=np.double,order='F')
if x.shape[0] != n:
raise Exception("x must have n entries. x.size is %d, n is %d" % (x.size,n))
else:
nVectors = x.shape[1]
y = np.empty((m,nVectors),dtype=np.double,order='F')
if x.shape[0] != n:
raise Exception("x must have n entries. x.size is %d, n is %d" % (x.size,n))

# Check input
if x.dtype.type is not np.double:
x = x.astype(np.double,copy=True)
# Put it in column-major order, otherwise for nVectors > 1 this FAILS completely
if x.flags['F_CONTIGUOUS'] is not True:
x = x.copy(order='F')

if nVectors == 1:
np_x = x.ctypes.data_as(POINTER(c_double))
np_y = y.ctypes.data_as(POINTER(c_double))
# now call MKL. This returns the answer in np_y, which links to y
SpMV(byref(c_char("N")), byref(c_int(m)),data ,indptr, indices, np_x, np_y )
else:
for columns in xrange(nVectors):
xx = x[:,columns]
yy = y[:,columns]
np_x = xx.ctypes.data_as(POINTER(c_double))
np_y = yy.ctypes.data_as(POINTER(c_double))
SpMV(byref(c_char("N")), byref(c_int(m)),data,indptr, indices, np_x, np_y )

return y

关于multithreading - 使用 MKL BLAS 时,scipy 是否支持稀疏矩阵乘法的多线程?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/17158893/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com