gpt4 book ai didi

python-3.6 - 用于Xgboost的基于F1的自定义评估功能-Python API

转载 作者:行者123 更新时间:2023-12-03 21:04:22 26 4
gpt4 key购买 nike

为了优化F1,我编写了以下与xgboost结合使用的自定义评估函数。不幸的是,使用xgboost运行时,它返回异常。

评估功能如下:

def F1_eval(preds, labels):

t = np.arange(0, 1, 0.005)
f = np.repeat(0, 200)
Results = np.vstack([t, f]).T

P = sum(labels == 1)

for i in range(200):
m = (preds >= Results[i, 0])
TP = sum(labels[m] == 1)
FP = sum(labels[m] == 0)

if (FP + TP) > 0:
Precision = TP/(FP + TP)

Recall = TP/P

if (Precision + Recall >0) :
F1 = 2 * Precision * Recall / (Precision + Recall)
else:
F1 = 0

Results[i, 1] = F1

return(max(Results[:, 1]))


下面,我提供了一个可复制的示例以及错误消息:

    from sklearn import datasets

Wine = datasets.load_wine()

X_wine = Wine.data
y_wine = Wine.target

y_wine[y_wine == 2] = 1

X_wine_train, X_wine_test, y_wine_train, y_wine_test = train_test_split(X_wine, y_wine, test_size = 0.2)

clf_wine = xgb.XGBClassifier(max_depth=6, learning_rate=0.1,silent=False, objective='binary:logistic', \
booster='gbtree', n_jobs=8, nthread=None, gamma=0, min_child_weight=1, max_delta_step=0, \
subsample=0.8, colsample_bytree=0.8, colsample_bylevel=1, reg_alpha=0, reg_lambda=1)

clf_wine.fit(X_wine_train, y_wine_train,\
eval_set=[(X_wine_train, y_wine_train), (X_wine_test, y_wine_test)], eval_metric=F1_eval, early_stopping_rounds=10, verbose=True)

---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-453-452852658dd8> in <module>()
12 clf_wine = xgb.XGBClassifier(max_depth=6, learning_rate=0.1,silent=False, objective='binary:logistic', booster='gbtree', n_jobs=8, nthread=None, gamma=0, min_child_weight=1, max_delta_step=0, subsample=0.8, colsample_bytree=0.8, colsample_bylevel=1, reg_alpha=0, reg_lambda=1)
13
---> 14 clf_wine.fit(X_wine_train, y_wine_train,eval_set=[(X_wine_train, y_wine_train), (X_wine_test, y_wine_test)], eval_metric=F1_eval, early_stopping_rounds=10, verbose=True)
15

C:\ProgramData\Anaconda3\lib\site-packages\xgboost\sklearn.py in fit(self, X, y, sample_weight, eval_set, eval_metric, early_stopping_rounds, verbose, xgb_model, sample_weight_eval_set)
519 early_stopping_rounds=early_stopping_rounds,
520 evals_result=evals_result, obj=obj, feval=feval,
--> 521 verbose_eval=verbose, xgb_model=None)
522
523 self.objective = xgb_options["objective"]

C:\ProgramData\Anaconda3\lib\site-packages\xgboost\training.py in train(params, dtrain, num_boost_round, evals, obj, feval, maximize, early_stopping_rounds, evals_result, verbose_eval, xgb_model, callbacks, learning_rates)
202 evals=evals,
203 obj=obj, feval=feval,
--> 204 xgb_model=xgb_model, callbacks=callbacks)
205
206

C:\ProgramData\Anaconda3\lib\site-packages\xgboost\training.py in _train_internal(params, dtrain, num_boost_round, evals, obj, feval, xgb_model, callbacks)
82 # check evaluation result.
83 if len(evals) != 0:
---> 84 bst_eval_set = bst.eval_set(evals, i, feval)
85 if isinstance(bst_eval_set, STRING_TYPES):
86 msg = bst_eval_set

C:\ProgramData\Anaconda3\lib\site-packages\xgboost\core.py in eval_set(self, evals, iteration, feval)
957 if feval is not None:
958 for dmat, evname in evals:
--> 959 feval_ret = feval(self.predict(dmat), dmat)
960 if isinstance(feval_ret, list):
961 for name, val in feval_ret:

<ipython-input-383-dfb8d5181b18> in F1_eval(preds, labels)
11
12
---> 13 P = sum(labels == 1)
14
15

TypeError: 'bool' object is not iterable


我不明白为什么该功能无法正常工作。我在这里遵循了以下示例: https://github.com/dmlc/xgboost/blob/master/demo/guide-python/custom_objective.py

我想了解我在哪里犯错。

最佳答案

在执行sum(labels == 1)时,Python将标签== 1评估为Boolean对象,因此得到TypeError: 'bool' object is not iterable

函数sum需要一个可迭代的对象,例如列表。这是您的错误的示例:

In[32]: sum(True)
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2963, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-32-6eb8f80b7f2e>", line 1, in <module>
sum(True)
TypeError: 'bool' object is not iterable


如果要使用scikit-learn的f1_score,可以实现以下包装:

from sklearn.metrics import f1_score
import numpy as np

def f1_eval(y_pred, dtrain):
y_true = dtrain.get_label()
err = 1-f1_score(y_true, np.round(y_pred))
return 'f1_err', err


总结的参数是(预测的) listDMatrix,它返回一个字符串float

# Setting your classifier
clf_wine = xgb.XGBClassifier(max_depth=6, learning_rate=0.1,silent=False, objective='binary:logistic', \
booster='gbtree', n_jobs=8, nthread=None, gamma=0, min_child_weight=1, max_delta_step=0, \
subsample=0.8, colsample_bytree=0.8, colsample_bylevel=1, reg_alpha=0, reg_lambda=1)

# When you fit, add eval_metric=f1_eval
# Please don't forget to insert all the .fit arguments required
clf_wine.fit(eval_metric=f1_eval)


Here您可以看到有关如何实现自定义目标函数和自定义评估指标的示例

包含以下代码的示例:

# user defined evaluation function, return a pair metric_name, result
# NOTE: when you do customized loss function, the default prediction value is margin
# this may make builtin evaluation metric not function properly
# for example, we are doing logistic loss, the prediction is score before logistic transformation
# the builtin evaluation error assumes input is after logistic transformation
# Take this in mind when you use the customization, and maybe you need write customized evaluation function
def evalerror(preds, dtrain):
labels = dtrain.get_label()
# return a pair metric_name, result
# since preds are margin(before logistic transformation, cutoff at 0)
return 'error', float(sum(labels != (preds > 0.0))) / len(labels)


它指定一个评估函数作为参数(预测,dtrain)获取dtrain的类型为 DMatrix,并返回一个字符串,即float,它是度量标准和错误的名称。



添加有效的python代码示例

import numpy as np

def _F1_eval(preds, labels):
t = np.arange(0, 1, 0.005)
f = np.repeat(0, 200)
results = np.vstack([t, f]).T
# assuming labels only containing 0's and 1's
n_pos_examples = sum(labels)
if n_pos_examples == 0:
raise ValueError("labels not containing positive examples")

for i in range(200):
pred_indexes = (preds >= results[i, 0])
TP = sum(labels[pred_indexes])
FP = len(labels[pred_indexes]) - TP
precision = 0
recall = TP / n_pos_examples

if (FP + TP) > 0:
precision = TP / (FP + TP)

if (precision + recall > 0):
F1 = 2 * precision * recall / (precision + recall)
else:
F1 = 0
results[i, 1] = F1
return (max(results[:, 1]))

if __name__ == '__main__':
labels = np.random.binomial(1, 0.75, 100)
preds = np.random.random_sample(100)
print(_F1_eval(preds, labels))


如果要实现_F1_eval以专门用于xgboost评估方法,请添加以下内容:

def F1_eval(preds, dtrain):
res = _F1_eval(preds, dtrain.get_label())
return 'f1_err', 1-res

关于python-3.6 - 用于Xgboost的基于F1的自定义评估功能-Python API,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51587535/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com