gpt4 book ai didi

python - 我试图使我的数据平衡,因为我的目标变量具有多类,并且我想对其进行过采样以使我的数据平衡

转载 作者:行者123 更新时间:2023-11-30 09:03:32 30 4
gpt4 key购买 nike

x包含变量:print(x)

    Restaurant  Cuisines    Average_Cost    Rating  Votes   Reviews Area
0 3.526361 0.693147 5.303305 1.504077 2.564949 1.609438 7.214504
1 1.386294 4.127134 4.615121 1.504077 2.484907 1.609438 5.905362
2 2.772589 1.386294 5.017280 1.526056 4.605170 3.433987 6.131226
3 3.912023 2.833213 5.525453 1.547563 5.176150 4.564348 7.643483
4 3.526361 2.708050 5.303305 1.435085 5.948035 5.046646 6.126869
... ... ... ... ... ... ... ...
11089 3.912023 0.693147 5.525453 1.648659 5.789960 5.046646 3.135494
11090 1.386294 6.028279 4.615121 1.526056 3.610918 2.833213 7.643483
11091 1.386294 2.397895 4.615121 1.504077 3.828641 2.944439 5.814131
11092 1.386294 6.028279 4.615121 1.410987 3.218876 2.302585 5.905362
11093 1.386294 6.028279 4.615121 1.029619 0.000000 0.000000 5.564520
11094 rows × 7 columns

并让 y 为多类目标变量。 打印(y.value_counts())

    30 minutes     7406
45 minutes 2665
65 minutes 923
120 minutes 62
20 minutes 20
80 minutes 14
10 minutes 4
Name: Delivery_Time, dtype: int64

探索 y 变量后,我们可以看到 30 分钟 类别与其他类别相比具有更高的计数。

为了平衡这些,我尝试 SMOTETomek 对数据进行过采样。但我得到了一个错误:

from imblearn.combine import SMOTETomek
smk = SMOTEtomek(ratio = 1)
x_res, y_res = smk.fit_sample(x,y)

错误:

---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-54-426e8b86623d> in <module>()
1 from imblearn.combine import SMOTETomek
2 smk = SMOTETomek(ratio = 1)
----> 3 x_res, y_res = smk.fit_sample(x,y)

2 frames
/usr/local/lib/python3.6/dist-packages/imblearn/utils/_validation.py in _sampling_strategy_float(sampling_strategy, y, sampling_type)
311 if type_y != 'binary':
312 raise ValueError(
--> 313 '"sampling_strategy" can be a float only when the type '
314 'of target is binary. For multi-class, use a dict.')
315 target_stats = _count_class_sample(y)

ValueError: "sampling_strategy" can be a float only when the type of target is binary. For multi-class, use a dict.

最佳答案

您可以看到Smote的实际实现: https://github.com/scikit-learn-contrib/imbalanced-learn/blob/master/imblearn/utils/_validation.py#L355

您只需传递错误中提到的字典即可。但SMOTE算法内部负责多类设置。

做:

from imblearn.oversampling import SMOTE
smote=SMOTE("minority")
X,Y=smote.fit_sample(x_train,y_train)
When dict, the keys correspond to the targeted classes. The
values correspond to the desired number of samples for each targeted
class.

关于python - 我试图使我的数据平衡,因为我的目标变量具有多类,并且我想对其进行过采样以使我的数据平衡,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58872043/

30 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com