gpt4 book ai didi

python - sklearn kneighbours内存错误python

转载 作者:太空宇宙 更新时间:2023-11-03 11:23:48 25 4
gpt4 key购买 nike

我正在使用 Windows 7 8gb 内存。

这是我用来对 52MB 训练数据集中的自由文本列进行矢量化的矢量化器

vec = CountVectorizer(analyzer='word',stop_words='english',decode_error='ignore',binary=True)

我想用这个数据集为 18MB 的测试集计算 5 个最近的邻居。

nbrs = NearestNeighbors(n_neighbors=5).fit(vec.transform(data['clean_sum']))
vectors = vec.transform(data_test['clean_sum'])
distances,indices = nbrs.kneighbors(vectors)

这是堆栈跟踪 -

Traceback (most recent call last):
File "cr_nearness.py", line 224, in <module>
distances,indices = nbrs.kneighbors(vectors)
File "C:\Anaconda2\lib\site-packages\sklearn\neighbors\base.py", line 371,
kneighbors
n_jobs=n_jobs, squared=True)
File "C:\Anaconda2\lib\site-packages\sklearn\metrics\pairwise.py", line 12
in pairwise_distances
return _parallel_pairwise(X, Y, func, n_jobs, **kwds)
File "C:\Anaconda2\lib\site-packages\sklearn\metrics\pairwise.py", line 10
in _parallel_pairwise
return func(X, Y, **kwds)
File "C:\Anaconda2\lib\site-packages\sklearn\metrics\pairwise.py", line 23
n euclidean_distances
distances = safe_sparse_dot(X, Y.T, dense_output=True)
File "C:\Anaconda2\lib\site-packages\sklearn\utils\extmath.py", line 181,
afe_sparse_dot
ret = ret.toarray()
File "C:\Anaconda2\lib\site-packages\scipy\sparse\compressed.py", line 940
toarray
return self.tocoo(copy=False).toarray(order=order, out=out)
File "C:\Anaconda2\lib\site-packages\scipy\sparse\coo.py", line 250, in to
y
B = self._process_toarray_args(order, out)
File "C:\Anaconda2\lib\site-packages\scipy\sparse\base.py", line 817, in _
ess_toarray_args
return np.zeros(self.shape, dtype=self.dtype, order=order)
MemoryError

有什么想法吗?

最佳答案

将 KNN 与 KD 树结合使用

model = KNeighborsClassifier(n_neighbors=5,algorithm='kd_tree').fit(X_train, Y_train)

默认情况下的模型是 algorithm='brute'。 brute false 占用太多内存。我认为你的模型应该是这样的

nbrs = NearestNeighbors(n_neighbors=5,algorithm='kd_tree').fit(vec.transform(data['clean_sum']))

关于python - sklearn kneighbours内存错误python,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37782049/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com