gpt4 book ai didi

python - 为什么我的 AdaBoost 实现的错误没有下降?

转载 作者:行者123 更新时间:2023-11-30 21:57:18 24 4
gpt4 key购买 nike

我正在尝试通过以下伪代码在 Python 中实现 Adaboost M1: enter image description here

我已经取得了一些进展,但是,我的“错误预测”数量并没有减少。

我检查了我的权重更新函数,它似乎正确更新了权重。

错误可能出现在分类器中,因为“不正确的预测”的数量每隔一次迭代都是相同的整数 - 我已经尝试了 100 次迭代。我不知道为什么它每次迭代的错误率没有减少。

如果有小费,我们将不胜感激。谢谢:)

from sklearn import tree
import pandas as pd
import numpy as np
import math

df = pd.read_csv("./dataset(3)/adaboost_train.csv")
X_train = df.loc[:,'x1':'x10']
Y_train = df[['y']]



def adaBoost(X_train,Y_train):
classifiers = []
# initializing the weights:
N = len(Y_train)
w_i = [1 / N] * N

T = 20
x_train = (X_train.apply(lambda x: x.tolist(), axis=1))
clf_errors = []

for t in range(T):
print("Iteration:", t)
# clf = clf2.fit(X_train,Y_train, sample_weight = w_i)

clf = tree.DecisionTreeClassifier(max_depth=1)
clf.fit(X_train, Y_train, sample_weight = w_i)

#Predict all the values:
y_pred = []
for sample in x_train:
p = clf.predict([sample])
p = p[0]
y_pred.append(p)
num_of_incorrect = calculate_error_clf(y_pred, Y_train)


clf_errors.append(num_of_incorrect)

error_internal = calc_error(w_i,Y_train,y_pred)

alpha = np.log((1-error_internal)/ error_internal)
print(alpha)

# Add the predictions, error and alpha for later use for every iteration
classifiers.append((y_pred, error_internal, alpha))

if t == 2 and y_pred == classifiers[0][0]:
print("TRUE")


w_i = update_weights(w_i,y_pred,Y_train,alpha,clf)


def calc_error(weights,Y_train,y_pred):
err = 0
for i in range(len(weights)):
if y_pred[i] != Y_train['y'].iloc[i]:
err= err + weights[i]
# Normalizing the error:
err = err/np.sum(weights)
return err

# If the prediction is true, return 0. If it is not true, return 1.
def check_pred(y_p, y_t):
if y_p == y_t:
return 0
else:
return 1

def update_weights(w,y_pred,Y_train,alpha,clf):
for j in range(len(w)):
if y_pred[j] != Y_train['y'].iloc[j]:
w[j] = w[j]* (np.exp( alpha * 1))
return w

def calculate_error_clf(y_pred, y):
sum_error = 0
for i in range(len(y)):
if y_pred[i] != y.iloc[i]['y']:
sum_error += 1
e = (y_pred[i] - y.iloc[i]['y'])**2


#sum_error += e
sum_error = sum_error
return sum_error


我预计错误会减少,但事实并非如此。例如:

iteration 1: num_of_incorrect 4444
iteration 2: num_of_incorrect 4762
iteration 3: num_of_incorrect 4353
iteration 4: num_of_incorrect 4762
iteration 5: num_of_incorrect 4450
iteration 6: num_of_incorrect 4762
...
does not converge



最佳答案

错误分类的数量不会随着每次迭代而减少(因为每个分类器都是一周分类器)。它是一个集成模型,它为之前错误分类的样本赋予更多权重。因此,在下一次迭代中,一些先前错误分类的样本将被正确分类,但这也可能导致先前正确分类的样本出错(因此迭代级别错误没有改善)。尽管每个分类器都很弱,但由于最终输出是所有分类器的加权和,因此最终分类会收敛到强学习器(参见算法的第 3 行)。

我使用 numpy 的实现

from sklearn import tree
import pandas as pd
import numpy as np
import math
from sklearn.datasets import load_breast_cancer, classification_report
from sklearn.metrics import confusion_matrix

data = load_breast_cancer()
X_train = data.data
Y_train = np.where(data.target == 0, 1, -1)

def adaBoost(X_train,Y_train):
classifiers = []
# initializing the weights:
N = len(Y_train)
w_i = np.array([1 / N] * N)

T = 20
clf_errors = []

for t in range(T):
clf = tree.DecisionTreeClassifier(max_depth=1)
clf.fit(X_train, Y_train, sample_weight = w_i)

#Predict all the values:
y_pred = clf.predict(X_train)
#print (confusion_matrix(Y_train, y_pred))

# Line 2(b) of algorithm
error = np.sum(np.where(Y_train != y_pred, w_i, 0))/np.sum(w_i)
print("Iteration: {0}, Missed: {1}".format(t, np.sum(np.where(Y_train != y_pred, 1, 0))))

# Line 2(c) of algorithm
alpha = np.log((1-error)/ error)
classifiers.append((alpha, clf))
# Line 2(d) of algorithm
w_i = np.where(Y_train != y_pred, w_i*np.exp(alpha), w_i)
return classifiers

clfs = adaBoost(X_train, Y_train)

# Line 3 of algorithm
def predict(clfs, x):
s = np.zeros(len(x))
for (alpha, clf) in clfs:
s += alpha*clf.predict(x)
return np.sign(s)

print (confusion_matrix(Y_train, predict(clfs,X_train)))

输出:


迭代:0,错过:44
迭代:1,错过:48
迭代:2,错过:182
迭代:3,错过:73
迭代:4,错过:102
迭代:5,错过:160
迭代:6,错过:185
迭代:7,错过:69
迭代:8,错过:357
迭代:9,错过:127
迭代:10,错过:256
迭代:11,错过:160
迭代:12,错过:298
迭代:13,错过:64
迭代:14,错过:221
迭代:15,错过:113
迭代:16,错过:261
迭代:17,错过:368
迭代:18,错过:49
迭代:19,错过:171
[[354 3]
[3209]]


精确召回率 f1-score 支持
-1 0.99 0.99 0.99 357
1 0.99 0.99 0.99 212
平均/总计 0.99 0.99 0.99 569

正如您所看到的,丢失次数不会改善,但是如果您检查混淆矩阵(在代码中取消注释),您将看到一些先前错误分类的样本将被正确分类。最后,对于预测,由于我们通过误差对分类器进行加权,因此权重总和收敛到强分类器(如最终的预测所示)。

关于python - 为什么我的 AdaBoost 实现的错误没有下降?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55318330/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com