gpt4 book ai didi

macos - 使用Python 3在scikit-learn上进行大数据序列化

转载 作者:行者123 更新时间:2023-12-04 08:52:10 27 4
gpt4 key购买 nike

我有一台配备16 Gb RAM的MacBook(Mac OS X 10.9)。
通过Anaconda安装了两个Python:2.7.8和3.4.1。两者都配备了最新的scikit-learn 0.15.1。
在尝试运行该简单代码时(只需测试对大型矩阵进行序列化的可能性):

import numpy as np
test_data = np.random.rand(10000, 60000)
print(test_data.nbytes / 2**30)
from sklearn.externals import joblib
joblib.dump(test_data, '/Users/va/Desktop/test_data.joblib')

Python 2.7.8表现不错,但是Python 3.4.1遇到了以下错误:
Failed to save <class 'numpy.ndarray'> to .npy file:
Traceback (most recent call last):
File "/Users/va/anaconda/python.app/Contents/lib/python3.4/site-
packages/sklearn/externals/joblib/numpy_pickle.py", line 240, in save
obj, filename = self._write_array(obj, filename)
File "/Users/va/anaconda/python.app/Contents/lib/python3.4/site-
packages/sklearn/externals/joblib/numpy_pickle.py", line 203, in _write_array
self.np.save(filename, array)
File "/Users/va/anaconda/python.app/Contents/lib/python3.4/site-
packages/numpy/lib/npyio.py", line 453, in save
format.write_array(fid, arr)
File "/Users/va/anaconda/python.app/Contents/lib/python3.4/site-
packages/numpy/lib/format.py", line 410, in write_array
fp.write(array.tostring('C'))
OSError: [Errno 22] Invalid argument

Traceback (most recent call last):

File "<ipython-input-3-90ed09e5c6d4>", line 1, in <module>
joblib.dump(test_data, '/Users/va/Desktop/test_data.joblib')

File "/Users/va/anaconda/python.app/Contents/lib/python3.4/site-
packages/sklearn/externals/joblib/numpy_pickle.py", line 368, in dump
pickler.dump(value)

File "/Users/va/anaconda/python.app/Contents/lib/python3.4/pickle.py", line 412, in dump
self.framer.end_framing()

File "/Users/va/anaconda/python.app/Contents/lib/python3.4/pickle.py", line 196, in end_framing
self.commit_frame(force=True)

File "/Users/va/anaconda/python.app/Contents/lib/python3.4/pickle.py", line 208, in commit_frame
write(data)

OSError: [Errno 22] Invalid argument

看来问题在于要存储的数据量。例如,Python 3可以很好地处理1.5 Gb的np.random.rand(10000,20000)。

以防万一,泡菜也不能正常工作:
import pickle
with open('/Users/va/Desktop/test_data.pkl', 'wb') as f:
pickle.dump(test_data, f, protocol=pickle.HIGHEST_PROTOCOL)

前往:
Traceback (most recent call last):

File "<ipython-input-6-3f73f3011539>", line 3, in <module>
pickle.dump(test_data, f, protocol=pickle.HIGHEST_PROTOCOL)

OSError: [Errno 22] Invalid argument

在Windows 7上,Python 3.4与 joblibpickle均可正常使用。

有什么建议如何在Mac上使用Python 3解决该问题吗?

最佳答案

我在OS X 10.10和Python 3.4.3上也使用pickle发生了这种情况

取而代之的是,我开始使用https://github.com/zopefoundation/zodbpickle,它的速度要慢2-3倍,但绝对可以与sklearn分类器一起使用

关于macos - 使用Python 3在scikit-learn上进行大数据序列化,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/25301958/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com