gpt4 book ai didi

Python - Pandas,重新采样数据集以具有平衡的类

转载 作者:太空狗 更新时间:2023-10-29 21:54:03 37 4
gpt4 key购买 nike

使用以下数据框,只有 2 个可能的标签:

   name  f1  f2  label
0 A 8 9 1
1 A 5 3 1
2 B 8 9 0
3 C 9 2 0
4 C 8 1 0
5 C 9 1 0
6 D 2 1 0
7 D 9 7 0
8 D 3 1 0
9 E 5 1 1
10 E 3 6 1
11 E 7 1 1

我编写了一段代码,按“名称”列对数据进行分组,并将结果转换为一个 numpy 数组,因此每一行都是特定组的所有样本的集合,标签是另一个 numpy 数组:

数据:

[[8 9] [5 3] [0 0]] # A lable = 1
[[8 9] [0 0] [0 0]] # B lable = 0
[[9 2] [8 1] [9 1]] # C lable = 0
[[2 1] [9 7] [3 1]] # D lable = 0
[[5 1] [3 6] [7 1]] # E lable = 1

标签:

[[1]
[0]
[0]
[0]
[1]]

代码:

import pandas as pd
import numpy as np


def prepare_data(group_name):
df = pd.read_csv("../data/tmp.csv")


group_index = df.groupby(group_name).cumcount()
data = (df.set_index([group_name, group_index])
.unstack(fill_value=0).stack())



target = np.array(data['label'].groupby(level=0).apply(lambda x: [x.values[0]]).tolist())
data = data.loc[:, data.columns != 'label']
data = np.array(data.groupby(level=0).apply(lambda x: x.values.tolist()).tolist())
print(data)
print(target)


prepare_data('name')

我想从过度代表的类中重新采样和删除实例。

[[8 9] [5 3] [0 0]] # A lable = 1
[[8 9] [0 0] [0 0]] # B lable = 0
[[9 2] [8 1] [9 1]] # C lable = 0
# group D was deleted randomly from the '0' labels
[[5 1] [3 6] [7 1]] # E lable = 1

将是一个可以接受的解决方案,因为删除 D(标记为“0”)将导致 2 * 标签“1”和 2 * 标签“0”的平衡数据集。

最佳答案

一个非常简单的方法。摘自 sklearn 文档和 Kaggle。

from sklearn.utils import resample

df_majority = df[df.label==0]
df_minority = df[df.label==1]

# Upsample minority class
df_minority_upsampled = resample(df_minority,
replace=True, # sample with replacement
n_samples=20, # to match majority class
random_state=42) # reproducible results

# Combine majority class with upsampled minority class
df_upsampled = pd.concat([df_majority, df_minority_upsampled])

# Display new class counts
df_upsampled.label.value_counts()

关于Python - Pandas,重新采样数据集以具有平衡的类,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52735334/

37 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com