gpt4 book ai didi

python - Linux 和 Windows 之间的 numpy 性能差异

转载 作者:太空狗 更新时间:2023-10-29 21:46:17 40 4
gpt4 key购买 nike

我正在尝试在 2 台不同的计算机上运行 sklearn.decomposition.TruncatedSVD() 并了解性能差异。

计算机 1(Windows 7,物理计算机)

OS Name Microsoft Windows 7 Professional
System Type x64-based PC
Processor Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz, 3401 Mhz, 4 Core(s),
8 Logical Installed Physical Memory (RAM) 8.00 GB
Total Physical Memory 7.89 GB

计算机 2(Debian,在亚马逊云上)

Architecture:          x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 8

width: 64 bits
capabilities: ldt16 vsyscall32
*-core
description: Motherboard
physical id: 0
*-memory
description: System memory
physical id: 0
size: 29GiB
*-cpu
product: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
vendor: Intel Corp.
physical id: 1
bus info: cpu@0
width: 64 bits

计算机 3(Windows 2008R2,在亚马逊云上)

OS Name Microsoft Windows Server 2008 R2 Datacenter
Version 6.1.7601 Service Pack 1 Build 7601
System Type x64-based PC
Processor Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz, 2500 Mhz,
4 Core(s), 8 Logical Processor(s)
Installed Physical Memory (RAM) 30.0 GB

两台计算机都运行 Python 3.2 和相同的 sklearn、numpy、scipy 版本

我按如下方式运行 cProfile:

print(vectors.shape)
>>> (7500, 2042)

_decomp = TruncatedSVD(n_components=680, random_state=1)
global _o
_o = _decomp
cProfile.runctx('_o.fit_transform(vectors)', globals(), locals(), sort=1)

电脑1输出

>>>    833 function calls in 1.710 seconds
Ordered by: internal time

ncalls tottime percall cumtime percall filename:lineno(function)
1 0.767 0.767 0.782 0.782 decomp_svd.py:15(svd)
1 0.249 0.249 0.249 0.249 {method 'enable' of '_lsprof.Profiler' objects}
1 0.183 0.183 0.183 0.183 {method 'normal' of 'mtrand.RandomState' objects}
6 0.174 0.029 0.174 0.029 {built-in method csr_matvecs}
6 0.123 0.021 0.123 0.021 {built-in method csc_matvecs}
2 0.110 0.055 0.110 0.055 decomp_qr.py:14(safecall)
1 0.035 0.035 0.035 0.035 {built-in method dot}
1 0.020 0.020 0.589 0.589 extmath.py:185(randomized_range_finder)
2 0.018 0.009 0.019 0.010 function_base.py:532(asarray_chkfinite)
24 0.014 0.001 0.014 0.001 {method 'ravel' of 'numpy.ndarray' objects}
1 0.007 0.007 0.009 0.009 twodim_base.py:427(triu)
1 0.004 0.004 1.710 1.710 extmath.py:232(randomized_svd)

电脑2输出

>>>    858 function calls in 40.145 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
2 32.116 16.058 32.116 16.058 {built-in method dot}
1 6.148 6.148 6.156 6.156 decomp_svd.py:15(svd)
2 0.561 0.281 0.561 0.281 decomp_qr.py:14(safecall)
6 0.561 0.093 0.561 0.093 {built-in method csr_matvecs}
1 0.337 0.337 0.337 0.337 {method 'normal' of 'mtrand.RandomState' objects}
6 0.202 0.034 0.202 0.034 {built-in method csc_matvecs}
1 0.052 0.052 1.633 1.633 extmath.py:183(randomized_range_finder)
1 0.045 0.045 0.054 0.054 _methods.py:73(_var)
1 0.023 0.023 0.023 0.023 {method 'argmax' of 'numpy.ndarray' objects}
1 0.023 0.023 0.046 0.046 extmath.py:531(svd_flip)
1 0.016 0.016 40.145 40.145 <string>:1(<module>)
24 0.011 0.000 0.011 0.000 {method 'ravel' of 'numpy.ndarray' objects}
6 0.009 0.002 0.009 0.002 {method 'reduce' of 'numpy.ufunc' objects}
2 0.008 0.004 0.009 0.004 function_base.py:532(asarray_chkfinite)

电脑3输出

>>>         858 function calls in 2.223 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.956 0.956 0.972 0.972 decomp_svd.py:15(svd)
2 0.306 0.153 0.306 0.153 {built-in method dot}
1 0.274 0.274 0.274 0.274 {method 'normal' of 'mtrand.RandomState' objects}
6 0.205 0.034 0.205 0.034 {built-in method csr_matvecs}
6 0.151 0.025 0.151 0.025 {built-in method csc_matvecs}
2 0.133 0.067 0.133 0.067 decomp_qr.py:14(safecall)
1 0.032 0.032 0.043 0.043 _methods.py:73(_var)
1 0.030 0.030 0.030 0.030 {method 'argmax' of 'numpy.ndarray' objects}
24 0.026 0.001 0.026 0.001 {method 'ravel' of 'numpy.ndarray' objects}
2 0.019 0.010 0.020 0.010 function_base.py:532(asarray_chkfinite)
1 0.019 0.019 0.773 0.773 extmath.py:183(randomized_range_finder)
1 0.019 0.019 0.049 0.049 extmath.py:531(svd_flip)

请注意 {内置方法点} 从 0.035 秒/调用到 16.058 秒/调用的差异,慢了 450 倍!!

------+---------+---------+---------+---------+---------------------------------------
ncalls| tottime | percall | cumtime | percall | filename:lineno(function) HARDWARE
------+---------+---------+---------+---------+---------------------------------------
1 | 0.035 | 0.035 | 0.035 | 0.035 | {built-in method dot} Computer 1
2 | 32.116 | 16.058 | 32.116 | 16.058 | {built-in method dot} Computer 2
2 | 0.306 | 0.153 | 0.306 | 0.153 | {built-in method dot} Computer 3

我明白应该有性能差异,但我应该那么高吗?

有没有办法进一步调试这个性能问题?

编辑

我测试了一台新电脑,电脑 3,其硬件与电脑 2 相似,但操作系统不同

{built-in method dot} 的结果是 0.153s/call,仍然比 Linux 快 100 倍!

编辑 2

计算机 1 numpy 配置

>>> np.__config__.show()
lapack_opt_info:
libraries = ['mkl_lapack95_lp64', 'mkl_blas95_lp64', 'mkl_intel_lp64', 'mkl_intel_thread', 'mkl_core', 'libiomp5md', 'libifportmd', 'mkl_lapack95_lp64', 'mkl_blas95_lp64', 'mkl_intel_lp64', 'mkl_intel_thread', 'mkl_core', 'libiomp5md', 'libifportmd']
library_dirs = ['C:/Program Files (x86)/Intel/Composer XE/mkl/lib/intel64']
define_macros = [('SCIPY_MKL_H', None)]
include_dirs = ['C:/Program Files (x86)/Intel/Composer XE/mkl/include']
blas_opt_info:
libraries = ['mkl_lapack95_lp64', 'mkl_blas95_lp64', 'mkl_intel_lp64', 'mkl_intel_thread', 'mkl_core', 'libiomp5md', 'libifportmd']
library_dirs = ['C:/Program Files (x86)/Intel/Composer XE/mkl/lib/intel64']
define_macros = [('SCIPY_MKL_H', None)]
include_dirs = ['C:/Program Files (x86)/Intel/Composer XE/mkl/include']
openblas_info:
NOT AVAILABLE
lapack_mkl_info:
libraries = ['mkl_lapack95_lp64', 'mkl_blas95_lp64', 'mkl_intel_lp64', 'mkl_intel_thread', 'mkl_core', 'libiomp5md', 'libifportmd', 'mkl_lapack95_lp64', 'mkl_blas95_lp64', 'mkl_intel_lp64', 'mkl_intel_thread', 'mkl_core', 'libiomp5md', 'libifportmd']
library_dirs = ['C:/Program Files (x86)/Intel/Composer XE/mkl/lib/intel64']
define_macros = [('SCIPY_MKL_H', None)]
include_dirs = ['C:/Program Files (x86)/Intel/Composer XE/mkl/include']
blas_mkl_info:
libraries = ['mkl_lapack95_lp64', 'mkl_blas95_lp64', 'mkl_intel_lp64', 'mkl_intel_thread', 'mkl_core', 'libiomp5md', 'libifportmd']
library_dirs = ['C:/Program Files (x86)/Intel/Composer XE/mkl/lib/intel64']
define_macros = [('SCIPY_MKL_H', None)]
include_dirs = ['C:/Program Files (x86)/Intel/Composer XE/mkl/include']
mkl_info:
libraries = ['mkl_lapack95_lp64', 'mkl_blas95_lp64', 'mkl_intel_lp64', 'mkl_intel_thread', 'mkl_core', 'libiomp5md', 'libifportmd']
library_dirs = ['C:/Program Files (x86)/Intel/Composer XE/mkl/lib/intel64']
define_macros = [('SCIPY_MKL_H', None)]
include_dirs = ['C:/Program Files (x86)/Intel/Composer XE/mkl/include']

电脑2 numpy配置

>>> np.__config__.show()
lapack_info:
NOT AVAILABLE
lapack_opt_info:
NOT AVAILABLE
blas_info:
libraries = ['blas']
library_dirs = ['/usr/lib']
language = f77
atlas_threads_info:
NOT AVAILABLE
atlas_blas_info:
NOT AVAILABLE
lapack_src_info:
NOT AVAILABLE
openblas_info:
NOT AVAILABLE
atlas_blas_threads_info:
NOT AVAILABLE
blas_mkl_info:
NOT AVAILABLE
blas_opt_info:
libraries = ['blas']
library_dirs = ['/usr/lib']
language = f77
define_macros = [('NO_ATLAS_INFO', 1)]
atlas_info:
NOT AVAILABLE
lapack_mkl_info:
NOT AVAILABLE
mkl_info:
NOT AVAILABLE

最佳答案

{built-in method dot}np.dot 函数,它是围绕矩阵-矩阵、矩阵-向量和向量的 CBLAS 例程的 NumPy 包装器-向量乘法。您的 Windows 机器使用高度优化的 Intel MKL CBLAS 版本。 Linux 机器正在使用较慢的旧引用实现。

如果安装 ATLASOpenBLAS (两者都可以通过 Linux 包管理器获得)或者,事实上,英特尔 MKL,您可能会看到巨大的加速。尝试 sudo apt-get install libatlas-dev,再次检查 NumPy 配置以查看它是否拾取 ATLAS,然后再次测量。

一旦您决定了正确的 CBLAS 库,您可能想要重新编译 scikit-learn。它的大部分只使用 NumPy 来满足其线性代数需求,但一些算法(特别是 k-means)直接使用 CBLAS。

操作系统与此无关。

关于python - Linux 和 Windows 之间的 numpy 性能差异,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/26609475/

40 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com