gpt4 book ai didi

python-3.x - Python 中的多元线性回归机器学习

转载 作者:行者123 更新时间:2023-11-30 09:14:21 24 4
gpt4 key购买 nike

我正在尝试使用多元线性回归机器学习根据某些输入来评估输出。我已经训练了数据并在运行以下代码时获得了正确的预期值:

dataset = pd.read_excel('TEST.xlsx')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 5].values

# Encoding categorical data
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder = LabelEncoder()
X[:, 0] = labelencoder.fit_transform(X[:, 0]) # 1ST COLUMN

labelencoder1 = LabelEncoder()
X[:, 1] = labelencoder1.fit_transform(X[:, 1]) # 2ND COLUMN

labelencoder2 = LabelEncoder()
X[:, 2] = labelencoder2.fit_transform(X[:, 2]) # # 3RD COLUMN

onehotencoder = OneHotEncoder(categorical_features = "all")
X = onehotencoder.fit_transform(X).toarray()

# Avoiding the Dummy Variable Trap
X = X[:, 1:]

# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)
y_pred = regressor.predict(X_test) # TILL HERE ITS WORKING AS EXPECTED

现在我尝试使用相同的模型来评估另一组输入数据,如下所示:

dataset1 = pd.read_excel('TEST1.xlsx')  # NEW SET OF INPUT RECORDS TO BE EVALUATE
X1 = dataset1.iloc[:, :-1].values
# Encoding categorical data
labelencoder3 = LabelEncoder()
X1[:, 0] = labelencoder3.fit_transform(X1[:, 0])

labelencoder4 = LabelEncoder()
X1[:, 1] = labelencoder4.fit_transform(X1[:, 1])

labelencoder5 = LabelEncoder()
X1[:, 2] = labelencoder5.fit_transform(X1[:, 2])

onehotencoder2 = OneHotEncoder(categorical_features = "all")
X1 = onehotencoder2.fit_transform(X1).toarray()
X1 = X1[:, 1:]
output = regressor.predict(X1)

但是当我运行此代码时出现以下错误:

ValueError: shapes (6,13) and (390,) not aligned: 13 (dim 1) != 390 (dim 0)

如果有人能帮助我解决这个问题,那就太好了。

最佳答案

X 和 X1 之间的 future 的大小相同吗?
例如,如果X包含五个单词,则用OneHotEncoder转换的X的形状为(n,5),regressor.fit(X_train, y_train)返回y=b+a1x1+a2x2+…a5x5的反对
例如,如果X_1包含10个单词,则用OneHotEnd转换的X_1的形状为(n,10),则需要y=b+a1x1+a2x2+….a10x10的回归器对象来计算X_1,其由使用仅包含 10 个单词的数据进行训练。因此,X_1 (n,10) 不会用 y=b+a1x1+a2x2+…a5x5 来计算
更重要的是,我认为onehotencoder.fit_transform()之后不需要toarray
我不确定我的回答是否有助于解决您的问题,但我希望如此。

关于python-3.x - Python 中的多元线性回归机器学习,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59152570/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com