gpt4 book ai didi

python - 如何为 catboost 创建自定义评估指标?

转载 作者:行者123 更新时间:2023-12-04 01:08:52 27 4
gpt4 key购买 nike

类似的问题:

  • Python Catboost: Multiclass F1 score custom metric

  • Catboost 教程
  • https://catboost.ai/docs/concepts/python-usages-examples.html#user-defined-loss-function

  • 问题
    在这个问题中,我有一个二元分类问题。建模后,我们得到了测试模型预测 y_pred 并且我们已经有了真正的测试标签 y_true
    我想获得由以下等式定义的自定义评估指标:
    profit = 400 * truePositive - 200*fasleNegative - 100*falsePositive
    此外,由于更高的利润更好,我想最大化函数而不是最小化它。
    如何在 catboost 中获得这个 eval_metric?
    使用 sklearn
    def get_profit(y_true, y_pred):
    tn, fp, fn, tp = sklearn.metrics.confusion_matrix(y_true,y_pred).ravel()
    loss = 400*tp - 200*fn - 100*fp
    return loss

    scoring = sklearn.metrics.make_scorer(get_profit, greater_is_better=True)
    使用 catboost
    class ProfitMetric(object):
    def get_final_error(self, error, weight):
    return error / (weight + 1e-38)

    def is_max_optimal(self):
    return True

    def evaluate(self, approxes, target, weight):
    assert len(approxes) == 1
    assert len(target) == len(approxes[0])

    approx = approxes[0]

    error_sum = 0.0
    weight_sum = 0.0

    ** I don't know here**

    return error_sum, weight_sum
    问题
    如何在 catboost 中完成自定义评估指标?
    更新
    到目前为止我的更新
    import numpy as np
    import pandas as pd
    import seaborn as sns
    import sklearn

    from catboost import CatBoostClassifier
    from sklearn.model_selection import train_test_split

    def get_profit(y_true, y_pred):
    tn, fp, fn, tp = sklearn.metrics.confusion_matrix(y_true,y_pred).ravel()
    profit = 400*tp - 200*fn - 100*fp
    return profit


    class ProfitMetric:
    def is_max_optimal(self):
    return True # greater is better

    def evaluate(self, approxes, target, weight):
    assert len(approxes) == 1
    assert len(target) == len(approxes[0])

    approx = approxes[0]

    y_pred = np.rint(approx)
    y_true = np.array(target).astype(int)

    output_weight = 1 # weight is not used

    score = get_profit(y_true, y_pred)

    return score, output_weight

    def get_final_error(self, error, weight):
    return error


    df = sns.load_dataset('titanic')
    X = df[['survived','pclass','age','sibsp','fare']]
    y = X.pop('survived')

    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=100)


    model = CatBoostClassifier(metric_period=50,
    n_estimators=200,
    eval_metric=ProfitMetric()
    )

    model.fit(X, y, eval_set=(X_test, y_test)) # this fails

    最佳答案

    与您的主要区别在于:

    @staticmethod
    def get_profit(y_true, y_pred):
    y_pred = expit(y_pred).astype(int)
    y_true = y_true.astype(int)
    #print("ACCURACY:",(y_pred==y_true).mean())
    tn, fp, fn, tp = confusion_matrix(y_true, y_pred).ravel()
    loss = 400*tp - 200*fn - 100*fp
    return loss
    example 中您链接的预测并不明显,但在检查后发现 catboost 在内部将预测视为原始对数赔率(帽子提示 @Ben)。因此,要正确使用 confusion_matrix,您需要确保 y_truey_pred 都是整数类标签。这是通过以下方式完成的:
    y_pred = scipy.special.expit(y_pred) 
    y_true = y_true.astype(int)
    所以完整的工作代码是:
    import seaborn as sns
    from catboost import CatBoostClassifier
    from sklearn.model_selection import train_test_split
    from sklearn.metrics import confusion_matrix
    from scipy.special import expit

    df = sns.load_dataset('titanic')
    X = df[['survived','pclass','age','sibsp','fare']]
    y = X.pop('survived')

    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=100)

    class ProfitMetric:

    @staticmethod
    def get_profit(y_true, y_pred):
    y_pred = expit(y_pred).astype(int)
    y_true = y_true.astype(int)
    #print("ACCURACY:",(y_pred==y_true).mean())
    tn, fp, fn, tp = confusion_matrix(y_true, y_pred).ravel()
    loss = 400*tp - 200*fn - 100*fp
    return loss

    def is_max_optimal(self):
    return True # greater is better

    def evaluate(self, approxes, target, weight):
    assert len(approxes) == 1
    assert len(target) == len(approxes[0])
    y_true = np.array(target).astype(int)
    approx = approxes[0]
    score = self.get_profit(y_true, approx)
    return score, 1

    def get_final_error(self, error, weight):
    return error

    model = CatBoostClassifier(metric_period=50,
    n_estimators=200,
    eval_metric=ProfitMetric()
    )

    model.fit(X, y, eval_set=(X_test, y_test))

    关于python - 如何为 catboost 创建自定义评估指标?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/65462220/

    27 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com