gpt4 book ai didi

python - Scikit-学习: "The least populated class in y has only 1 member"

转载 作者:太空宇宙 更新时间:2023-11-04 08:40:35 25 4
gpt4 key购买 nike

我正在尝试使用 Scikit-learn 进行随机森林回归。使用 Pandas 加载数据后的第一步是将数据拆分为测试集和训练集。但是,我收到错误:

The least populated class in y has only 1 member

我在 Google 上进行了搜索,发现了该错误的各种实例,但我似乎仍然无法理解该错误的含义。

training_file = "training_data.txt"
data = pd.read_csv(training_file, sep='\t')

y = data.Result
X = data.drop('Result', axis=1)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=123, stratify=y)

pipeline = make_pipeline(preprocessing.StandardScaler(), RandomForestRegressor(n_estimators=100))

hyperparameters = { 'randomforestregressor__max_features' : ['auto', 'sqrt', 'log2'],
'randomforestregressor__max_depth' : [None, 5, 3, 1] }

model = GridSearchCV(pipeline, hyperparameters, cv=10)

model.fit(X_train, y_train)

prediction = model.predict(X_test)

joblib.dump(model, 'ms5000.pkl')

train_test_split 方法生成此堆栈跟踪:

Traceback (most recent call last):
File "/Users/justin.shapiro/Desktop/IPML_Model/model_definition.py", line 18, in <module>
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.22, random_state=123, stratify=y)
File "/Library/Python/2.7/site-packages/sklearn/model_selection/_split.py", line 1700, in train_test_split
train, test = next(cv.split(X=arrays[0], y=stratify))
File "/Library/Python/2.7/site-packages/sklearn/model_selection/_split.py", line 953, in split
for train, test in self._iter_indices(X, y, groups):
File "/Library/Python/2.7/site-packages/sklearn/model_selection/_split.py", line 1259, in _iter_indices
raise ValueError("The least populated class in y has only 1"
ValueError: The least populated class in y has only 1 member, which is too few. The minimum number of groups for any class cannot be less than 2.

这是我的数据集的示例:

var1    var2    var3    var4    var5    var6    var7    var8    Result
high 5000.0 0 60 1000 75 0.23 0.75 17912.0
mid 5000.0 0 60 1000 50 0.23 0.75 18707.0
low 5000.0 0 60 1000 25 0.23 0.75 17912.0
high 5000.0 5 60 1000 75 0.23 0.75 18577.0
mid 5000.0 5 60 1000 50 0.23 0.75 19407.0
low 5000.0 5 60 1000 25 0.23 0.75 18577.0

这是什么错误,我该如何摆脱它?

最佳答案

此行出现错误:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.22, random_state=123, stratify=y)

尝试删除stratify=y

关于python - Scikit-学习: "The least populated class in y has only 1 member",我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45242891/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com