- android - 多次调用 OnPrimaryClipChangedListener
- android - 无法更新 RecyclerView 中的 TextView 字段
- android.database.CursorIndexOutOfBoundsException : Index 0 requested, 光标大小为 0
- android - 使用 AppCompat 时,我们是否需要明确指定其 UI 组件(Spinner、EditText)颜色
首先,我知道有多个线程涉及这个问题,但是我无法得到一个直接的答案并且遇到了一些 flops 的错误计算。
我准备了一个元素乘法的 MATLAB 和 Python 基准测试。这是最简单、最先进的方法,可以轻松计算出翻牌数。
它使用 NxN 数组(矩阵)但不进行矩阵乘法,而是逐元素乘法。这很重要,因为当使用矩阵乘法时,运算次数不是 N^3 !!!
lower level algorithm that performs matrix multiplication不到 N^3 次操作就完成了。
然而,随机生成的数字的逐元素乘法的执行必须在 N^2 次操作中执行
我有一个英特尔 i7-4770(我认为它有 4 个物理内核和 8 个虚拟内核)@ 3.5GHz。因此,如果假设每个周期 4flops,那么每个内核应该是 14 GFLOPS!
MATLAB/Numpy/Scipy 离我们很远。
为什么?
MATLAB:
%element wise multiplication benchmark
N = 10^4;
nOps = N^2;
m1 = randn(N);
m2 = randn(size(m1));
m = randn(size(m1));
m1 = single(m1);
m2 = single(m2);
% clear m
tic
m1 = m1 .* m2;
t = toc;
gflops = nOps/t*1e-9;
t_gflops = [t gflops]
% clear m
tic
m1 = m1.*m2;
t = toc;
gflops = nOps/t*1e-9;
t_gflops = [t gflops]
% clear m
tic
m1 = m1.*m2;
t = toc;
gflops = nOps/t*1e-9;
t_gflops = [t gflops]
version('-blas')
version('-lapack')
结果是:
t_gflops =
0.0978 1.0226
t_gflops =
0.0743 1.3458
t_gflops =
0.0731 1.3682
ans =
Intel(R) Math Kernel Library Version 11.1.1 Product Build 20131010 for Intel(R) 64 architecture applications
ans =
Intel(R) Math Kernel Library Version 11.1.1 Product Build 20131010 for Intel(R) 64 architecture applications
Linear Algebra PACKage Version 3.4.1
现在是 Python:
import numpy as np
# import gnumpy as gnp
import scipy as sp
import scipy.linalg as la
import time
if __name__ == '__main__':
N = 10**4
nOps = N**2
a = np.random.randn(N,N).astype(np.float32)
b = np.random.randn(N,N).astype(np.float32)
t = time.time()
c = a*b
dt = time.time()-t
gflops = nOps/dt*1e-9
print("dt = ", dt, ", gflops = ", gflops)
t = time.time()
c = np.multiply(a, b)
dt = time.time()-t
gflops = nOps/dt*1e-9
print("dt = ", dt, ", gflops = ", gflops)
t = time.time()
c = sp.multiply(a, b)
dt = time.time()-t
gflops = nOps/dt*1e-9
print("dt = ", dt, ", gflops = ", gflops)
t = time.time()
c = sp.multiply(a, b)
dt = time.time()-t
gflops = nOps/dt*1e-9
print("dt = ", dt, ", gflops = ", gflops)
a = np.random.randn(N,1).astype(np.float32)
b = np.random.randn(1,N).astype(np.float32)
t = time.time()
c1 = np.dot(a, b)
dt = time.time()-t
gflops = nOps/dt*1e-9
print("dt = ", dt, ", gflops = ", gflops)
t = time.time()
c = np.dot(a, b)
dt = time.time()-t
gflops = nOps/dt*1e-9
print("dt = ", dt, ", gflops = ", gflops)
t = time.time()
c = la.blas.dgemm(1.0,a,b)
dt = time.time()-t
gflops = nOps/dt*1e-9
print("dt = ", dt, ", gflops = ", gflops)
t = time.time()
c = la._fblas.dgemm(1.0,a,b)
dt = time.time()-t
gflops = nOps/dt*1e-9
print("dt = ", dt, ", gflops = ", gflops)
print("numpy config")
np.show_config()
print("scipy config")
sp.show_config()
# numpy
结果是:
dt = 0.16301608085632324 , gflops = 0.6134364136022663
dt = 0.16701674461364746 , gflops = 0.5987423610209003
dt = 0.1770176887512207 , gflops = 0.5649152957845881
dt = 0.188018798828125 , gflops = 0.5318617107612401
dt = 0.151015043258667 , gflops = 0.6621856858903415
dt = 0.17201733589172363 , gflops = 0.5813367558659613
dt = 0.3080308437347412 , gflops = 0.3246428142959423
dt = 0.39503931999206543 , gflops = 0.253139358385916
numpy 配置
mkl_info:
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
libraries = ['mkl_core_dll', 'mkl_intel_lp64_dll', 'mkl_intel_thread_dll']
library_dirs = ['C:\\Minonda\\envs\\_build\\Library\\lib']
include_dirs = ['C:\\Minonda\\envs\\_build\\Library\\include']
lapack_mkl_info:
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
libraries = ['mkl_lapack95_lp64', 'mkl_core_dll', 'mkl_intel_lp64_dll', 'mkl_intel_thread_dll']
library_dirs = ['C:\\Minonda\\envs\\_build\\Library\\lib']
include_dirs = ['C:\\Minonda\\envs\\_build\\Library\\include']
lapack_opt_info:
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
libraries = ['mkl_lapack95_lp64', 'mkl_core_dll', 'mkl_intel_lp64_dll', 'mkl_intel_thread_dll']
library_dirs = ['C:\\Minonda\\envs\\_build\\Library\\lib']
include_dirs = ['C:\\Minonda\\envs\\_build\\Library\\include']
blas_opt_info:
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
libraries = ['mkl_core_dll', 'mkl_intel_lp64_dll', 'mkl_intel_thread_dll']
library_dirs = ['C:\\Minonda\\envs\\_build\\Library\\lib']
include_dirs = ['C:\\Minonda\\envs\\_build\\Library\\include']
openblas_lapack_info:
不可用blas_mkl_info:
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
libraries = ['mkl_core_dll', 'mkl_intel_lp64_dll', 'mkl_intel_thread_dll']
library_dirs = ['C:\\Minonda\\envs\\_build\\Library\\lib']
include_dirs = ['C:\\Minonda\\envs\\_build\\Library\\include']
科学配置
mkl_info:
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
libraries = ['mkl_core_dll', 'mkl_intel_lp64_dll', 'mkl_intel_thread_dll']
library_dirs = ['C:\\Minonda\\envs\\_build\\Library\\lib']
include_dirs = ['C:\\Minonda\\envs\\_build\\Library', 'C:\\Minonda\\envs\\_build\\Library\\include', 'C:\\Minonda\\envs\\_build\\Library\\lib']
lapack_mkl_info:
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
libraries = ['mkl_lapack95_lp64', 'mkl_core_dll', 'mkl_intel_lp64_dll', 'mkl_intel_thread_dll']
library_dirs = ['C:\\Minonda\\envs\\_build\\Library\\lib']
include_dirs = ['C:\\Minonda\\envs\\_build\\Library', 'C:\\Minonda\\envs\\_build\\Library\\include', 'C:\\Minonda\\envs\\_build\\Library\\lib']
lapack_opt_info:
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
libraries = ['mkl_lapack95_lp64', 'mkl_core_dll', 'mkl_intel_lp64_dll', 'mkl_intel_thread_dll']
library_dirs = ['C:\\Minonda\\envs\\_build\\Library\\lib']
include_dirs = ['C:\\Minonda\\envs\\_build\\Library', 'C:\\Minonda\\envs\\_build\\Library\\include', 'C:\\Minonda\\envs\\_build\\Library\\lib']
blas_opt_info:
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
libraries = ['mkl_core_dll', 'mkl_intel_lp64_dll', 'mkl_intel_thread_dll']
library_dirs = ['C:\\Minonda\\envs\\_build\\Library\\lib']
include_dirs = ['C:\\Minonda\\envs\\_build\\Library', 'C:\\Minonda\\envs\\_build\\Library\\include', 'C:\\Minonda\\envs\\_build\\Library\\lib']
openblas_lapack_info:
不可用blas_mkl_info:
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
libraries = ['mkl_core_dll', 'mkl_intel_lp64_dll', 'mkl_intel_thread_dll']
library_dirs = ['C:\\Minonda\\envs\\_build\\Library\\lib']
include_dirs = ['C:\\Minonda\\envs\\_build\\Library', 'C:\\Minonda\\envs\\_build\\Library\\include', 'C:\\Minonda\\envs\\_build\\Library\\lib']
进程结束,退出代码为 0
最佳答案
好吧,在这种情况下,您受到内存带宽的限制,而不是 CPU 能力。假设:
理论最大持续性能约为 2 GFLOPS。我将此数字计算为 峰值 DDR3 传输速率
* RAM channel 数
/每次 FLOP 传输的字节数
。
顺便说一句,在 numpy 中,元素运算不会被 BLAS 加速。我不确定 MATAB。
关于python - 为什么 MATLAB/Numpy/Scipy 性能很慢并且达不到 CPU 能力(触发器)?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39483409/
我有一个流 randStream,它每半秒发出一次随机值,还有一个 boolStream,它将值从 randStream 转换为 bool 值。 let randStream = Kefir.from
我是一名优秀的程序员,十分优秀!