Python - 遇到 x_test y_test 拟合错误-6ren

Python - 遇到 x_test y_test 拟合错误

转载作者：行者123 更新时间：2023-12-05 08:20:27

27

4

我已经构建了一个神经网络，它在大约 300,000 行、2 个分类变量和 1 个自变量的小型数据集上运行良好，但当我将其增加到 650 万行时遇到内存错误。所以我决定修改代码并越来越接近，但现在我遇到了拟合错误的问题。我有 2 个分类变量和一列用于 1 和 0 的因变量(可疑或不可疑。开始数据集看起来像这样:

DBF2
   ParentProcess                   ChildProcess               Suspicious
0  C:\Program Files (x86)\Wireless AutoSwitch\wrl...    ...            0
1  C:\Program Files (x86)\Wireless AutoSwitch\wrl...    ...            0
2  C:\Windows\System32\svchost.exe                      ...            1
3  C:\Program Files (x86)\Wireless AutoSwitch\wrl...    ...            0
4  C:\Program Files (x86)\Wireless AutoSwitch\wrl...    ...            0
5  C:\Program Files (x86)\Wireless AutoSwitch\wrl...    ...            0

我的代码遵循/有错误:

import pandas as pd
import numpy as np
import hashlib
import matplotlib.pyplot as plt
import timeit

X = DBF2.iloc[:, 0:2].values
y = DBF2.iloc[:, 2].values#.ravel()

from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_X_1 = LabelEncoder()
X[:, 0] = labelencoder_X_1.fit_transform(X[:, 0])
labelencoder_X_2 = LabelEncoder()
X[:, 1] = labelencoder_X_2.fit_transform(X[:, 1])

onehotencoder = OneHotEncoder(categorical_features = [0,1])
X = onehotencoder.fit_transform(X)

index_to_drop = [0, 2039]
to_keep = list(set(xrange(X.shape[1]))-set(index_to_drop))
X = X[:,to_keep]

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)

#ERROR
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/sklearn/base.py", line 517, in fit_transform
    return self.fit(X, **fit_params).transform(X)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/preprocessing/data.py", line 590, in fit
    return self.partial_fit(X, y)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/preprocessing/data.py", line 621, in partial_fit
    "Cannot center sparse matrices: pass `with_mean=False` "
ValueError: Cannot center sparse matrices: pass `with_mean=False` instead. See docstring for motivation and alternatives.

X_test = sc.transform(X_test)

#ERROR
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/sklearn/preprocessing/data.py", line 677, in transform
    check_is_fitted(self, 'scale_')
  File "/usr/local/lib/python2.7/dist-packages/sklearn/utils/validation.py", line 768, in check_is_fitted
    raise NotFittedError(msg % {'name': type(estimator).__name__})
sklearn.exceptions.NotFittedError: This StandardScaler instance is not fitted yet. Call 'fit' with appropriate arguments before using this method.

如果这对我打印 X_train 和 y_train 有帮助:

X_train
<5621203x7043 sparse matrix of type '<type 'numpy.float64'>'
with 11242334 stored elements in Compressed Sparse Row format>

y_train
array([0, 0, 0, ..., 0, 0, 0])

最佳答案

X_train 是一个稀疏矩阵，当您像您的案例一样使用大型数据集时，它非常有用。问题是作为 documentation解释:

with_mean : boolean, True by default

If True, center the data before scaling. This does not work (and will raise an exception) when attempted on sparse matrices, because centering them entails building a dense matrix which in common use cases is likely to be too large to fit in memory.

你可以尝试传递 with_mean=False :

sc = StandardScaler(with_mean=False)
X_train = sc.fit_transform(X_train)

以下行失败，因为 sc 仍然是未触及的 StandardScaler 对象。

X_test = sc.transform(X_test)

要能够使用转换方法，您首先必须使 StandardScaler 适合数据集。如果您的目的是将 StandardScaler 安装在您的训练集上，并使用它将训练集和测试集转换到同一空间，那么您可以按如下方式进行:

sc = StandardScaler(with_mean=False)
X_train_sc = sc.fit(X_train)
X_train = X_train_sc.transform(X_train)
X_test = X_train_sc.transform(X_test)

关于Python - 遇到 x_test y_test 拟合错误，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/52008548/

27

4

0

文章推荐： python - 如何选择目录并将其存储到 tkinter 中的变量中？

文章推荐： python - 如何知道是否发生欠拟合或过拟合？

文章推荐： php - 已验证的类不存在，电子邮件验证 Laravel 5.7

Python - 遇到 x_test y_test 拟合错误
我已经构建了一个神经网络，它在大约 300,000 行、2 个分类变量和 1 个自变量的小型数据集上运行良好，但当我将其增加到 650 万行时遇到内存错误。所以我决定修改代码并越来越接近，但现在我遇到
python - clf.score(X_test,Y_test) 如何用于线性回归？
Y_预测= [ 1.22770607 -0.04569864 2.23941551 1.35143415 1.28991445 0.01398049 1.05511961 1.84491
python - 从 Keras 中的生成器获取 x_test、y_test？
对于某些问题，验证数据不能是生成器，例如:TensorBoard histograms : If printing histograms, validation_data must be provid
python - 如何在 clf.predict_proba(X_test) 中获得更多小数？
我有一个 pandas 数据框，用于二元分类案例(类别 A 和类别 B)。为了获得 X_train、X_test、y_train、y_test，我按 70:30 拆分，如下所示: from sklea
python - 如何在 X_train、y_train、X_test、y_test 中分割图像数据集？
我有一个如下结构的数据集: Dataset/ | | -----Pothole/ | | | ------ umm001.jpg |
python - 如何在 X_train、y_train、X_test、y_test 中分割图像数据集？
我有一个如下结构的数据集: Dataset/ | | -----Pothole/ | | | ------ umm001.jpg |
python - kerasequential().predict(x_test) 只返回两个类的 1 列
我在使用 keras sequential().predict(x_test) 时遇到问题。顺便说一句，使用 sequential().predict_proba(x_test) 获得相同的输出，因
python - Keras 中的 x_train 和 x_test 有什么区别？
我看过一些教程，以深入了解 Keras，以使用卷积神经网络进行深度学习。在教程(以及 Keras 的官方文档)中，MNIST 数据集是这样加载的: from keras.datasets import
python-3.x - 训练/分割数据后在 X_train 和 X_test 中获取 NaN
世界各地的程序员们大家好。我在将数据输入机器学习模型时遇到问题。我尝试使用 pandas 将 CSV 文件读入 python，然后将其拆分为训练数据和测试数据。之后，我使用 StandardScal
python - 如何将 tf.data.Dataset 拆分为 x_train、y_train、x_test、y_test for keras
如果我有一个数据集 dataset = tf.keras.preprocessing.image_dataset_from_directory( directory, labels="
python-3.x - 用 sklearn 拆分后如何重新合并 X_test 和 y_test & x_train 和 y_train？
所以我想在拆分它们(使用分层)后对训练和测试数据集执行平均目标编码，并且为了这样做，必须将它们重新合并在一起。我该怎么做？，任何建议将不胜感激？ , 谢谢你。 X_train, X_test

首页

博学

6Ren·AI

商城

Python - 遇到 x_test y_test 拟合错误