gpt4 book ai didi

python - 为什么 MATLAB/Numpy/Scipy 性能很慢并且达不到 CPU 能力(触发器)?

转载 作者:太空宇宙 更新时间:2023-11-04 03:05:56 25 4
gpt4 key购买 nike

首先,我知道有多个线程涉及这个问题,但是我无法得到一个直接的答案并且遇到了一些 flops 的错误计算。

我准备了一个元素乘法的 MATLAB 和 Python 基准测试。这是最简单、最先进的方法,可以轻松计算出翻牌数。

它使用 NxN 数组(矩阵)但不进行矩阵乘法,而是逐元素乘法。这很重要,因为当使用矩阵乘法时,运算次数不是 N^3 !!!

lower level algorithm that performs matrix multiplication不到 N^3 次操作就完成了。

然而,随机生成的数字的逐元素乘法的执行必须在 N^2 次操作中执行

我有一个英特尔 i7-4770(我认为它有 4 个物理内核和 8 个虚拟内核)@ 3.5GHz。因此,如果假设每个周期 4flops,那么每个内核应该是 14 GFLOPS!

MATLAB/Numpy/Scipy 离我们很远。

为什么?

MATLAB:

%element wise multiplication benchmark
N = 10^4;
nOps = N^2;
m1 = randn(N);
m2 = randn(size(m1));
m = randn(size(m1));

m1 = single(m1);
m2 = single(m2);

% clear m
tic
m1 = m1 .* m2;
t = toc;
gflops = nOps/t*1e-9;
t_gflops = [t gflops]

% clear m
tic
m1 = m1.*m2;
t = toc;
gflops = nOps/t*1e-9;
t_gflops = [t gflops]


% clear m
tic
m1 = m1.*m2;
t = toc;
gflops = nOps/t*1e-9;
t_gflops = [t gflops]

version('-blas')
version('-lapack')

结果是:

t_gflops =

0.0978 1.0226


t_gflops =

0.0743 1.3458


t_gflops =

0.0731 1.3682


ans =

Intel(R) Math Kernel Library Version 11.1.1 Product Build 20131010 for Intel(R) 64 architecture applications



ans =

Intel(R) Math Kernel Library Version 11.1.1 Product Build 20131010 for Intel(R) 64 architecture applications
Linear Algebra PACKage Version 3.4.1

现在是 Python:

import numpy as np
# import gnumpy as gnp
import scipy as sp
import scipy.linalg as la
import time

if __name__ == '__main__':
N = 10**4
nOps = N**2
a = np.random.randn(N,N).astype(np.float32)
b = np.random.randn(N,N).astype(np.float32)

t = time.time()
c = a*b
dt = time.time()-t
gflops = nOps/dt*1e-9
print("dt = ", dt, ", gflops = ", gflops)

t = time.time()
c = np.multiply(a, b)
dt = time.time()-t
gflops = nOps/dt*1e-9
print("dt = ", dt, ", gflops = ", gflops)

t = time.time()
c = sp.multiply(a, b)
dt = time.time()-t
gflops = nOps/dt*1e-9
print("dt = ", dt, ", gflops = ", gflops)

t = time.time()
c = sp.multiply(a, b)
dt = time.time()-t
gflops = nOps/dt*1e-9
print("dt = ", dt, ", gflops = ", gflops)

a = np.random.randn(N,1).astype(np.float32)
b = np.random.randn(1,N).astype(np.float32)

t = time.time()
c1 = np.dot(a, b)
dt = time.time()-t
gflops = nOps/dt*1e-9
print("dt = ", dt, ", gflops = ", gflops)

t = time.time()
c = np.dot(a, b)
dt = time.time()-t
gflops = nOps/dt*1e-9
print("dt = ", dt, ", gflops = ", gflops)

t = time.time()
c = la.blas.dgemm(1.0,a,b)
dt = time.time()-t
gflops = nOps/dt*1e-9
print("dt = ", dt, ", gflops = ", gflops)

t = time.time()
c = la._fblas.dgemm(1.0,a,b)
dt = time.time()-t
gflops = nOps/dt*1e-9
print("dt = ", dt, ", gflops = ", gflops)

print("numpy config")
np.show_config()
print("scipy config")
sp.show_config()
# numpy

结果是:

dt =  0.16301608085632324 , gflops =  0.6134364136022663
dt = 0.16701674461364746 , gflops = 0.5987423610209003
dt = 0.1770176887512207 , gflops = 0.5649152957845881
dt = 0.188018798828125 , gflops = 0.5318617107612401
dt = 0.151015043258667 , gflops = 0.6621856858903415
dt = 0.17201733589172363 , gflops = 0.5813367558659613
dt = 0.3080308437347412 , gflops = 0.3246428142959423
dt = 0.39503931999206543 , gflops = 0.253139358385916

numpy 配置

mkl_info:

define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
libraries = ['mkl_core_dll', 'mkl_intel_lp64_dll', 'mkl_intel_thread_dll']
library_dirs = ['C:\\Minonda\\envs\\_build\\Library\\lib']
include_dirs = ['C:\\Minonda\\envs\\_build\\Library\\include']

lapack_mkl_info:

define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
libraries = ['mkl_lapack95_lp64', 'mkl_core_dll', 'mkl_intel_lp64_dll', 'mkl_intel_thread_dll']
library_dirs = ['C:\\Minonda\\envs\\_build\\Library\\lib']
include_dirs = ['C:\\Minonda\\envs\\_build\\Library\\include']

lapack_opt_info:

define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
libraries = ['mkl_lapack95_lp64', 'mkl_core_dll', 'mkl_intel_lp64_dll', 'mkl_intel_thread_dll']
library_dirs = ['C:\\Minonda\\envs\\_build\\Library\\lib']
include_dirs = ['C:\\Minonda\\envs\\_build\\Library\\include']

blas_opt_info:

define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
libraries = ['mkl_core_dll', 'mkl_intel_lp64_dll', 'mkl_intel_thread_dll']
library_dirs = ['C:\\Minonda\\envs\\_build\\Library\\lib']
include_dirs = ['C:\\Minonda\\envs\\_build\\Library\\include']

openblas_lapack_info:

不可用blas_mkl_info:

define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
libraries = ['mkl_core_dll', 'mkl_intel_lp64_dll', 'mkl_intel_thread_dll']
library_dirs = ['C:\\Minonda\\envs\\_build\\Library\\lib']
include_dirs = ['C:\\Minonda\\envs\\_build\\Library\\include']

科学配置

mkl_info:

define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
libraries = ['mkl_core_dll', 'mkl_intel_lp64_dll', 'mkl_intel_thread_dll']
library_dirs = ['C:\\Minonda\\envs\\_build\\Library\\lib']
include_dirs = ['C:\\Minonda\\envs\\_build\\Library', 'C:\\Minonda\\envs\\_build\\Library\\include', 'C:\\Minonda\\envs\\_build\\Library\\lib']

lapack_mkl_info:

define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
libraries = ['mkl_lapack95_lp64', 'mkl_core_dll', 'mkl_intel_lp64_dll', 'mkl_intel_thread_dll']
library_dirs = ['C:\\Minonda\\envs\\_build\\Library\\lib']
include_dirs = ['C:\\Minonda\\envs\\_build\\Library', 'C:\\Minonda\\envs\\_build\\Library\\include', 'C:\\Minonda\\envs\\_build\\Library\\lib']

lapack_opt_info:

define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
libraries = ['mkl_lapack95_lp64', 'mkl_core_dll', 'mkl_intel_lp64_dll', 'mkl_intel_thread_dll']
library_dirs = ['C:\\Minonda\\envs\\_build\\Library\\lib']
include_dirs = ['C:\\Minonda\\envs\\_build\\Library', 'C:\\Minonda\\envs\\_build\\Library\\include', 'C:\\Minonda\\envs\\_build\\Library\\lib']

blas_opt_info:

define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
libraries = ['mkl_core_dll', 'mkl_intel_lp64_dll', 'mkl_intel_thread_dll']
library_dirs = ['C:\\Minonda\\envs\\_build\\Library\\lib']
include_dirs = ['C:\\Minonda\\envs\\_build\\Library', 'C:\\Minonda\\envs\\_build\\Library\\include', 'C:\\Minonda\\envs\\_build\\Library\\lib']

openblas_lapack_info:

不可用blas_mkl_info:

define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
libraries = ['mkl_core_dll', 'mkl_intel_lp64_dll', 'mkl_intel_thread_dll']
library_dirs = ['C:\\Minonda\\envs\\_build\\Library\\lib']
include_dirs = ['C:\\Minonda\\envs\\_build\\Library', 'C:\\Minonda\\envs\\_build\\Library\\include', 'C:\\Minonda\\envs\\_build\\Library\\lib']

进程结束,退出代码为 0

最佳答案

好吧,在这种情况下,您受到内存带宽的限制,而不是 CPU 能力。假设:

  • PC3-12800RAM在双 channel 模式下;
  • 对于每个乘法(单精度),需要在 CPU 和 RAM 之间传输 12 个字节;

理论最大持续性能约为 2 GFLOPS。我将此数字计算为 峰值 DDR3 传输速率 * RAM channel 数/每次 FLOP 传输的字节数

顺便说一句,在 numpy 中,元素运算不会被 BLAS 加速。我不确定 MATAB。

关于python - 为什么 MATLAB/Numpy/Scipy 性能很慢并且达不到 CPU 能力(触发器)?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39483409/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com