gpt4 book ai didi

python - 如何知道我是否过度拟合/欠拟合我的数据?

转载 作者:行者123 更新时间:2023-11-28 18:56:27 25 4
gpt4 key购买 nike

所以我必须建立一个回归模型来根据 11 个输入来预测 Wine 质量。目前我正在评估各种算法的均方误差、平均绝对误差和 R2 分数。我想决定使用哪种算法,但在此之前,我想确保我的数据没有过拟合/欠拟合。下面是我使用的数据集的链接(它有点不同,但数据完全相同)以及我的整个代码。

非常感谢任何帮助!

数据: https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/

此外,我从中复制了大部分代码的 kagggle 链接: https://www.kaggle.com/jhansia/regression-models-analysis-on-the-wine-quality

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

wine = pd.read_csv('wineQualityReds.csv', usecols=lambda x: 'Unnamed' not in x,)

wine.head()

y = wine.quality
X = wine.drop('quality',axis = 1)

from sklearn.model_selection import train_test_split
train_x,test_x,train_y,test_y = train_test_split(X,y,random_state = 0, stratify = y)

from sklearn import preprocessing
scaler = preprocessing.StandardScaler().fit(train_x)
train_x_scaled = scaler.transform(train_x)

test_x_scaled = scaler.transform(test_x)

from sklearn import model_selection
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.svm import SVR
from sklearn.metrics import mean_absolute_error

models = []
models.append(('DecisionTree', DecisionTreeRegressor()))
models.append(('RandomForest', RandomForestRegressor()))
models.append(('GradienBoost', GradientBoostingRegressor()))
models.append(('SVR', SVR()))
names = []

for name,model in models:
kfold = model_selection.KFold(n_splits=5,random_state=2)
cv_results = model_selection.cross_val_score(model,train_x_scaled,train_y, cv= kfold, scoring = 'neg_mean_absolute_error')
names.append(name)
msg = "%s: %f" % (name, -1*(cv_results).mean())
print(msg)


model = RandomForestRegressor()
model.fit(train_x_scaled,train_y)
pred_y = model.predict(test_x_scaled)

from sklearn import metrics

print('Mean Squared Error:', metrics.mean_squared_error(test_y, pred_y))
print('Mean Absolute Error:', metrics.mean_absolute_error(test_y, pred_y))
print('Root Mean Squared Error:', np.sqrt(metrics.mean_squared_error(test_y, pred_y)))
print('R2:', metrics.r2_score(test_y, pred_y))

最佳答案

您可以对数据集使用交叉验证来确定它是过拟合还是欠拟合。

关于python - 如何知道我是否过度拟合/欠拟合我的数据?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57853990/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com