gpt4 book ai didi

python - XgBoost : The least populated class in y has only 1 members, 太少

转载 作者:太空狗 更新时间:2023-10-30 00:05:59 28 4
gpt4 key购买 nike

我在 sklearn 上使用 Xgboost 实现进行 kaggle 竞赛。但是,我收到此“警告”消息:

$ python Script1.py
/home/sky/private/virtualenv15.0.1dev/myVE/local/lib/python2.7/site-packages/sklearn/cross_validation.py:516:

Warning: The least populated class in y has only 1 members, which is too few. The minimum number of labels for any class cannot be less than n_folds=3.
% (min_labels, self.n_folds)), Warning)

根据 stackoverflow 上的另一个问题:

Check that you have at least 3 samples per class to be able to do StratifiedKFold cross validation with k == 3 (I think this is the default CV used by GridSearchCV for classification)."

好吧,我每个类(class)没有至少 3 个样本。

所以我的问题是:

  1. 有哪些替代方案?

  2. 为什么我不能使用交叉验证?

  3. 我可以用什么代替?

...

param_test1 = {
'max_depth': range(3, 10, 2),
'min_child_weight': range(1, 6, 2)
}

grid_search = GridSearchCV(

estimator=
XGBClassifier(
learning_rate=0.1,
n_estimators=3000,
max_depth=15,
min_child_weight=1,
gamma=0,
subsample=0.8,
colsample_bytree=0.8,
objective='multi:softmax',
nthread=42,
scale_pos_weight=1,
seed=27),

param_grid=param_test1, scoring='roc_auc', n_jobs=42, iid=False, cv=None, verbose=1)
...

grid_search.fit(train_x, place_id)

引用资料:

One-shot learning with scikit-learn

Using a support vector classifier with polynomial kernel in scikit-learn

最佳答案

如果您的目标/类别只有一个样本,那么对于任何模型来说都太少了。您可以做的是获得另一个数据集,最好尽可能平衡,因为大多数模型在平衡集中表现得更好。

如果您不能拥有另一个数据集,您将不得不使用已有的数据集。我建议您删除具有孤独目标的示例。因此,您将拥有一个不涵盖该目标的模型。如果这不符合您的要求,您需要一个新的数据集。

关于python - XgBoost : The least populated class in y has only 1 members, 太少,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37240195/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com