gpt4 book ai didi

python - get_dummies 在 pandas 中的用法

转载 作者:行者123 更新时间:2023-11-30 22:29:18 28 4
gpt4 key购买 nike

我正在阅读一本关于使用 Python 进行机器学习简介的书。这里作者描述如下假设对于 worker 类(Class)特征,我们可能的值为“政府”雇员”、“私有(private)雇员”、“个体运算符(operator)”和“个体运算符(operator)”特德”。

print("Original features:\n", list(data.columns), "\n")

data_dummies = pd.get_dummies(data)

print("Features after get_dummies:\n", list(data_dummies.columns))

Original features:
['age', 'workclass']

Features after get_dummies:
['age', 'workclass_ ?', 'workclass_ Government Employee', 'workclass_Private Employee', 'workclass_Self Employed', 'workclass_Self Employed Incorporated']

我的问题是新列 workclass_ 是什么?

最佳答案

它是使用workclass列的字符串值创建的:

data = pd.DataFrame({'age':[1,1,1,2,1,1],
'workclass':['Government Employee','Private Employee','Self Employed','Self Employed Incorpora ted','Self Employed Incorpora ted','?']})

print (data)
age workclass
0 1 Government Employee
1 1 Private Employee
2 1 Self Employed
3 2 Self Employed Incorpora ted
4 1 Self Employed Incorpora ted
5 1 ?
<小时/>
data_dummies = pd.get_dummies(data)
print (data_dummies)
age workclass_? workclass_Government Employee \
0 1 0 1
1 1 0 0
2 1 0 0
3 2 0 0
4 1 0 0
5 1 1 0

workclass_Private Employee workclass_Self Employed \
0 0 0
1 1 0
2 0 1
3 0 0
4 0 0
5 0 0

workclass_Self Employed Incorpora ted
0 0
1 0
2 0
3 1
4 1
5 0

如果有多列具有相同的值,这个前缀非常有用:

data = pd.DataFrame({'age':[1,1,3],
'workclass':['Government Employee','Private Employee','?'],
'workclass1':['Government Employee','Private Employee','Self Employed']})

print (data)
age workclass workclass1
0 1 Government Employee Government Employee
1 1 Private Employee Private Employee
2 3 ? Self Employed

data_dummies = pd.get_dummies(data)
print (data_dummies)
age workclass_? workclass_Government Employee \
0 1 0 1
1 1 0 0
2 3 1 0

workclass_Private Employee workclass1_Government Employee \
0 0 1
1 1 0
2 0 0

workclass1_Private Employee workclass1_Self Employed
0 0 0
1 1 0
2 0 1

如果不需要,可以添加参数,用空格覆盖:

data_dummies = pd.get_dummies(data, prefix='', prefix_sep='')
print (data_dummies)
age ? Government Employee Private Employee Government Employee \
0 1 0 1 0 1
1 1 0 0 1 0
2 3 1 0 0 0

Private Employee Self Employed
0 0 0
1 1 0
2 0 1

然后可以按列进行groupby,并为每个唯一列的虚拟对象聚合max:

print (data_dummies.groupby(level=0, axis=1).max())
? Government Employee Private Employee Self Employed age
0 0 1 0 0 1
1 0 0 1 0 1
2 1 0 0 1 3

关于python - get_dummies 在 pandas 中的用法,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46418219/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com