gpt4 book ai didi

python - 随机森林准确率太低

转载 作者:行者123 更新时间:2023-11-30 09:31:31 25 4
gpt4 key购买 nike

我想使用 randomforest 来预测电力消耗。数据调整后,最新情况如下

X=df[['Temp(⁰C)','Araç Sayısı (adet)','Montaj V362_WH','Montaj V363_WH','Montaj_Temp','avg_humidity']]

X.head(15)

输出:

Temp(⁰C)    Araç Sayısı (adet)  Montaj V362_WH  Montaj V363_WH  Montaj_Temp avg_humidity
0 3.250000 0.0 0.0 0.0 17.500000 88.250000
1 3.500000 868.0 16.0 18.0 20.466667 82.316667
2 3.958333 774.0 18.0 18.0 21.166667 87.533333
3 6.541667 0.0 0.0 0.0 18.900000 83.916667
4 4.666667 785.0 16.0 18.0 20.416667 72.650000
5 2.458333 813.0 18.0 18.0 21.166667 73.983333
6 -0.458333 804.0 16.0 18.0 20.500000 72.150000
7 -1.041667 850.0 16.0 16.0 19.850000 76.433333
8 -0.375000 763.0 16.0 18.0 20.500000 76.583333
9 4.375000 1149.0 16.0 16.0 21.416667 84.300000
10 8.541667 0.0 0.0 0.0 21.916667 71.650000
11 6.625000 763.0 16.0 18.0 22.833333 73.733333
12 5.333333 783.0 16.0 16.0 22.166667 69.250000
13 4.708333 764.0 16.0 18.0 21.583333 66.800000
14 4.208333 813.0 16.0 16.0 20.750000 68.150000

y.head(15)

输出:

    Montaj_ET_kWh/day
0 11951.0
1 41821.0
2 42534.0
3 14537.0
4 41305.0
5 42295.0
6 44923.0
7 44279.0
8 45752.0
9 44432.0
10 25786.0
11 42203.0
12 40676.0
13 39980.0
14 39404.0

X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.30, random_state=None)

clf = RandomForestRegressor(n_estimators=10000, random_state=0, n_jobs=-1)
clf.fit(X_train, y_train['Montaj_ET_kWh/day'])
for feature in zip(feature_list, clf.feature_importances_):
print(feature)

输出

  ('Temp(⁰C)', 0.11598075020423881)
('Araç Sayısı (adet)', 0.7047301384616493)
('Montaj V362_WH', 0.04065706901940535)
('Montaj V363_WH', 0.023077554218712878)
('Montaj_Temp', 0.08082006262985514)
('avg_humidity', 0.03473442546613837)


sfm = SelectFromModel(clf, threshold=0.10)
sfm.fit(X_train, y_train['Montaj_ET_kWh/day'])

for feature_list_index in sfm.get_support(indices=True):
print(feature_list[feature_list_index])

输出:

  Temp(⁰C)
Araç Sayısı (adet)

X_important_train = sfm.transform(X_train)
X_important_test = sfm.transform(X_test)

clf_important = RandomForestRegressor(n_estimators=10000, random_state=0, n_jobs=-1)
clf_important.fit(X_important_train, y_train)
y_test=y_test.values
y_pred = clf.predict(X_test)
y_test=y_test.reshape(-1,1)
y_pred=y_pred.reshape(-1,1)
y_test=y_test.ravel()
y_pred=y_pred.ravel()
label_encoder = LabelEncoder()
y_pred = label_encoder.fit_transform(y_pred)
y_test = label_encoder.fit_transform(y_test)

accuracy_score(y_test, y_pred)

输出:

 0.010964912280701754

我不知道为什么准确度太低,不知道我在哪里犯了错误

最佳答案

您的错误是您在回归设置中要求准确性(分类指标),这是毫无意义的。

来自accuracy_score documentation (强调):

sklearn.metrics.accuracy_score(y_true, y_pred, normalize=True, sample_weight=None)

Accuracy classification score.

检查list of metrics可在 scikit-learn 中获取合适的回归指标(您还可以在其中确认准确性仅用于分类);更多详情请参阅我的回答Accuracy Score ValueError: Can't Handle mix of binary and continuous target

关于python - 随机森林准确率太低,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55442721/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com