machine-learning - 如何在 sklearn::LGBMClassifier() 中的 LightGBM 分类器的 feature_importances

machine-learning - 如何在 sklearn::LGBMClassifier() 中的 LightGBM 分类器的 feature_importances_ 中将 'gain' 设置为特征重要性度量

转载作者：行者123 更新时间：2023-11-30 09:47:08

24

4

我正在 LightGBM 中使用 LGBMClassifer 构建二元分类器模型，如下所示:

 # LightGBM model
        clf = LGBMClassifier(
            nthread=4,
            n_estimators=10000,
            learning_rate=0.005,
            num_leaves= 45,
            colsample_bytree= 0.8,
            subsample= 0.4,
            subsample_freq=1,
            max_depth= 20,
            reg_alpha= 0.5,
            reg_lambda=0.5,
            min_split_gain=0.04,
            min_child_weight=.05
            random_state=0,
            silent=-1,
            verbose=-1)

下一步，根据训练数据拟合我的模型

     clf.fit(train_x, train_y, eval_set=[(train_x, train_y), (valid_x, valid_y)], 
                eval_metric= 'auc', verbose= 100, early_stopping_rounds= 200)

    fold_importance_df = pd.DataFrame()
    fold_importance_df["feature"] = feats   
    fold_importance_df["importance"] = clf.feature_importances_

输出:

feature                      importance
feature13                     1108
feature21                     1104
feature11                     774

到这里一切都很好，现在我正在研究基于此模型的特征重要性度量。因此，我使用 feature_importance_() 函数来获取该值(但默认情况下，它根据 split 提供功能重要性)
虽然 split 让我了解哪个功能在 split 中使用了多少次，但我认为 gain 会让我更好地理解功能的重要性.

LightGBM 增强类的 Python API https://lightgbm.readthedocs.io/en/latest/Python-API.html?highlight=importance提及:

 feature_importance(importance_type='split', iteration=-1)


 Parameters:importance_type (string, optional (default="split")) – 
 If “split”, result contains numbers 
 of times the feature is used in a model. If “gain”, result contains 
 total gains of splits which use the feature.
 Returns:   result – Array with feature importances.
 Return type:   numpy array`

然而，LightGBM LGBMClassifier() 的 Sklearn API 没有提及任何内容 Sklearn API LGBM ，这个函数只有这个参数:

feature_importances_
array of shape = [n_features] – The feature importances (the higher, the more important the feature).

我的问题是如何从 sklearn 版本(即基于 gain 的 LGBMClassifier())获取特征重要性？

最佳答案

feature_importance()是原始LGBM中Booster对象的一个方法。

sklearn API 通过 API Docs 中给出的属性 booster_ 公开训练数据上的底层 Booster。。

因此，您可以首先访问此助推器对象，然后以与原始 LGBM 相同的方式调用 feature_importance()。

clf.booster_.feature_importance(importance_type='gain')

关于machine-learning - 如何在 sklearn::LGBMClassifier() 中的 LightGBM 分类器的 feature_importances_ 中将 'gain' 设置为特征重要性度量，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/51118772/

24

4

0

文章推荐： java - CXF 生成的 WSDL 不包含 WS-SecurityPolicy 定义

文章推荐： java - 在电话号码后添加 "-"

文章推荐： JavaScript 从数组制作列表

python - 属性错误: Module "lightgbm" has no attribute "LGBMClassifier" and "Dataset"
我使用pip版本16.0.0安装了lightgbm(2.2.3)，但在上传数据集时出错。代码如下: import lightgbm as gbm d_train=gbm.Dataset(train_x
machine-learning - 如何在 sklearn::LGBMClassifier() 中的 LightGBM 分类器的 feature_importances_ 中将 'gain' 设置为特征重要性度量
我正在 LightGBM 中使用 LGBMClassifer 构建二元分类器模型，如下所示: # LightGBM model clf = LGBMClassifier(
python - 来自 scikit-learn 的 plot_partial_dependence() 为正确拟合的模型(例如 KerasRegressor 或 LGBMClassifier)错误地引发 NotFittedError
我正在尝试使用 sklearn.inspection.plot_partial_dependence 创建部分依赖图在我使用 keras 和 keras sklearn 包装实用程序成功构建的模型上(

首页

博学

6Ren·AI

商城

machine-learning - 如何在 sklearn::LGBMClassifier() 中的 LightGBM 分类器的 feature_importances_ 中将 'gain' 设置为特征重要性度量