gpt4 book ai didi

python - RandomForestClassifier.fit 在不同机器上使用不同数量的 RAM

转载 作者:太空宇宙 更新时间:2023-11-03 11:22:23 25 4
gpt4 key购买 nike

出于某种原因,sklearn.ensemble 中的 RandomForestClassifier.fit 在我的本地计算机上仅使用 2.5GB RAM,但在我的服务器上使用几乎 7GB 且训练集完全相同。

没有导入的代码几乎是这样的:

y_train = data_train['train_column']
x_train = data_train.drop('train_column', axis=1)

# Difference in memory consuming starts here
clf = RandomForestClassifier(n_estimators=100, random_state=42)
clf = clf.fit(x_train, y_train)
preds = clf.predict(data_test)

我的本​​地机器是 macbook pro 16GB 内存和 4 核 CPU我的服务器是 8 GB 内存和 4 核 CPU 的 digitalocean 云上的 Ubuntu 服务器。

sklearn版本为0.18,Python版本为3.5.2

我什至无法想象可能的原因,任何帮助都会非常有帮助。

更新

内存错误出现在 fit 方法中的这段代码中:

# Parallel loop: we use the threading backend as the Cython code
# for fitting the trees is internally releasing the Python GIL
# making threading always more efficient than multiprocessing in
# that case.
trees = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
backend="threading")(
delayed(_parallel_build_trees)(
t, self, X, y, sample_weight, i, len(trees),
verbose=self.verbose, class_weight=self.class_weight)
for i, t in enumerate(trees))

更新 2

关于我的系统的信息:

# local
Darwin-16.1.0-x86_64-i386-64bit
Python 3.5.2 (default, Oct 11 2016, 05:05:28)
[GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.38)]
NumPy 1.11.2
SciPy 0.18.1
Scikit-Learn 0.18

# server
Linux-3.13.0-57-generic-x86_64-with-Ubuntu-16.04-xenial
Python 3.5.1 (default, Dec 18 2015, 00:00:00)
[GCC 4.8.4]
NumPy 1.11.2
SciPy 0.18.1
Scikit-Learn 0.18

还有我的 numpy 配置:

# server
>>> np.__config__.show()
blas_opt_info:
libraries = ['openblas', 'openblas']
define_macros = [('HAVE_CBLAS', None)]
library_dirs = ['/usr/local/lib']
language = c
openblas_info:
libraries = ['openblas', 'openblas']
define_macros = [('HAVE_CBLAS', None)]
library_dirs = ['/usr/local/lib']
language = c
lapack_opt_info:
libraries = ['openblas', 'openblas']
define_macros = [('HAVE_CBLAS', None)]
library_dirs = ['/usr/local/lib']
language = c
blas_mkl_info:
NOT AVAILABLE
openblas_lapack_info:
libraries = ['openblas', 'openblas']
define_macros = [('HAVE_CBLAS', None)]
library_dirs = ['/usr/local/lib']
language = c


# local
>>> np.__config__.show()
blas_opt_info:
extra_link_args = ['-Wl,-framework', '-Wl,Accelerate']
define_macros = [('NO_ATLAS_INFO', 3), ('HAVE_CBLAS', None)]
extra_compile_args = ['-msse3', '-I/System/Library/Frameworks/vecLib.framework/Headers']
blas_mkl_info:
NOT AVAILABLE
atlas_threads_info:
NOT AVAILABLE
lapack_mkl_info:
NOT AVAILABLE
openblas_lapack_info:
NOT AVAILABLE
atlas_info:
NOT AVAILABLE
atlas_3_10_blas_info:
NOT AVAILABLE
lapack_opt_info:
extra_link_args = ['-Wl,-framework', '-Wl,Accelerate']
define_macros = [('NO_ATLAS_INFO', 3), ('HAVE_CBLAS', None)]
extra_compile_args = ['-msse3']
openblas_info:
NOT AVAILABLE
atlas_3_10_blas_threads_info:
NOT AVAILABLE
atlas_3_10_threads_info:
NOT AVAILABLE
atlas_3_10_info:
NOT AVAILABLE
atlas_blas_threads_info:
NOT AVAILABLE
atlas_blas_info:
NOT AVAILABLE

clf 对象的 Repr 在两台机器上是相同的:

RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
max_depth=None, max_features='auto', max_leaf_nodes=None,
min_impurity_split=1e-07, min_samples_leaf=1,
min_samples_split=2, min_weight_fraction_leaf=0.0,
n_estimators=100, n_jobs=1, oob_score=False, random_state=42,
verbose=0, warm_start=False)

最佳答案

一个可能的解释是您的服务器使用较旧的 scikit-learn。不久之前,sklearn RF 非常耗费内存是一个问题,如果我没记错的话,它已在 0.17 中修复。

关于python - RandomForestClassifier.fit 在不同机器上使用不同数量的 RAM,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40293169/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com