gpt4 book ai didi

machine-learning - 如何为泰坦尼克号数据集定义 oneHotEncoder

转载 作者:行者123 更新时间:2023-11-30 08:57:00 25 4
gpt4 key购买 nike

我正在尝试处理泰坦尼克号数据集。数据具有分类值,因此我使用 labelEncoder 将数据更改为数字,而不是文本。之前:

     PassengerId  Survived  Pclass     Sex    Age  SibSp  Parch      Fare Embarked
0 1 0 3 male 22.00 1 0 7.2500 S
1 2 1 1 female 38.00 1 0 71.2833 C
2 3 1 3 female 26.00 0 0 7.9250 S

之后:

     PassengerId  Survived  Pclass  Sex    Age  SibSp  Parch      Fare  Embarked
0 1 0 3 1 22.00 1 0 7.2500 2
1 2 1 1 0 38.00 1 0 71.2833 0
2 3 1 3 0 26.00 0 0 7.9250 2

这是代码:

from sklearn.preprocessing import LabelEncoder

labelencoder_X = LabelEncoder()
data['Embarked'] = labelencoder_X.fit_transform(data['Embarked'])
data['Sex'] = labelencoder_X.fit_transform(data['Sex'])

现在,因为乘客的性别具有同样的重要性,所以我想使用oneHotEncoder。据我了解,数据应如下所示:

     PassengerId  Survived  Pclass  Male Female    Age  SibSp  Parch      Fare  Embarked
0 1 0 3 1 0 22.00 1 0 7.2500 2
1 2 1 1 0 1 38.00 1 0 71.2833 0
2 3 1 3 0 1 26.00 0 0 7.9250 2

如何编写代码来执行此操作?我尝试对 oneHotEncoder 使用类似的方法:

from sklearn.preprocessing import LabelEncoder, OneHotEncoder

labelencoder_X = LabelEncoder()
data['Embarked'] = labelencoder_X.fit_transform(data['Embarked'])
data['Sex'] = labelencoder_X.fit_transform(data['Sex'])

onehotencoder = OneHotEncoder()
data['Embarked'] = onehotencoder.fit_transform(data['Embarked'].values.reshape(-1,1))

但它只是返回相同的结果。我该如何修复它?我是 Scikit 和 ML 的新手,我希望我做得正确。

最佳答案

这就是你可以做到的。

import pandas as pd
from sklearn.preprocessing import OneHotEncoder

# Sample data
Sex
0 1
1 0
2 0
3 1

# OneHotEncoder
result = OneHotEncoder().fit_transform(df['Sex'].reshape(-1, 1)).toarray()

# Appending columns
df[['Female', 'Male']] = pd.DataFrame(result, index = df.index)

# Resulting dataframe
df
Sex Female Male
0 1 0.0 1.0
1 0 1.0 0.0
2 0 1.0 0.0
3 1 0.0 1.0

关于machine-learning - 如何为泰坦尼克号数据集定义 oneHotEncoder,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56378173/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com