gpt4 book ai didi

python - 防止 RandomizedSearchCV 预测 KNN 分类器的所有一类

转载 作者:行者123 更新时间:2023-11-30 09:41:02 32 4
gpt4 key购买 nike

我正在使用 RandomizedSearchCV 和 KNeighborsClassifier 来尝试预测贷款违约。

使用 RandomizedSearchCV 在理论上似乎很棒,但当我对其进行测试时,它发现最好的 best_esimator_ 是预测所有相同标签的一个。

(数据分为 75% 付费和 25% 默认),因此我得到的准确率为 75%,但它只是预测所有付费。

n_neighbors = [int(x) for x in np.linspace(start = 1, stop = len(X_train)/3, num = 5)]
weights = ['uniform', 'distance']
algorithm = ["auto","ball_tree","kd_tree","brute"]
leaf_size = [int(x) for x in np.linspace(10, 100, num = 5)]
p = [1,2]

random_grid = {'n_neighbors': n_neighbors,
'weights': weights,
'algorithm': algorithm,
'leaf_size': leaf_size,
'p': p}

knn_clf = KNeighborsClassifier()
knn_random = RandomizedSearchCV(estimator = knn_clf, param_distributions = random_grid, n_iter = 25, cv = 3, verbose=1,)
knn_random.fit(X_train, y_train)

我能做些什么来解决这个问题吗?我可以传递一个参数来阻止这种情况发生吗?或者我可以在我的数据中做些什么?

y_测试:

38        PAIDOFF
189 PAIDOFF
140 PAIDOFF
286 COLLECTION
142 PAIDOFF
101 PAIDOFF
187 PAIDOFF
139 PAIDOFF
149 PAIDOFF
11 PAIDOFF
269 COLLECTION
231 PAIDOFF
258 PAIDOFF
84 PAIDOFF
242 PAIDOFF
344 COLLECTION
104 PAIDOFF
214 PAIDOFF
109 PAIDOFF
76 PAIDOFF
41 PAIDOFF
262 COLLECTION
125 PAIDOFF
107 PAIDOFF
27 PAIDOFF
14 PAIDOFF
92 PAIDOFF
194 PAIDOFF
113 PAIDOFF
333 COLLECTION
...
320 COLLECTION
15 PAIDOFF
72 PAIDOFF
122 PAIDOFF
243 PAIDOFF
184 PAIDOFF
294 COLLECTION
280 COLLECTION
218 PAIDOFF
197 PAIDOFF
133 PAIDOFF
143 PAIDOFF
179 PAIDOFF
249 PAIDOFF
80 PAIDOFF
331 COLLECTION
137 PAIDOFF
103 PAIDOFF
120 PAIDOFF
248 PAIDOFF
5 PAIDOFF
236 PAIDOFF
219 PAIDOFF
322 COLLECTION
283 COLLECTION
135 PAIDOFF
124 PAIDOFF
293 COLLECTION
166 PAIDOFF
85 PAIDOFF

预测:

array(['PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF',
'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF',
'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF',
'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF',
'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF',
'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF',
'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF',
'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF',
'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF',
'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF',
'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF',
'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF',
'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF',
'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF',
'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF',
'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF',
'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF',
'PAIDOFF', 'PAIDOFF'], dtype=object)

最佳答案

这是一个典型的数据不平衡问题。您可以尝试的一些简单操作是对少数类进行上采样或对多数类进行下采样,然后重试。更好的方法是更改​​算法并使用 SVC 或神经网络这可能会严重影响少数案件的损失。

例如,sklearn sklearn.svm.SVCclass_weights = 'balanced' 参数将对此有所帮助。它基本上会根据输入数据中少数群体的比例来衡量少数群体的成本。

The “balanced” mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as

关于python - 防止 RandomizedSearchCV 预测 KNN 分类器的所有一类,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58682095/

32 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com