gpt4 book ai didi

python - 如何使用 ColumnTransformer 处理分类数据?

转载 作者:行者123 更新时间:2023-12-01 07:49:28 25 4
gpt4 key购买 nike

我正在尝试预处理数据。

data = {'Country':['Germany', 'Turkey', 'England', 'Turkey', 'Germany', 'Turkey'],
'Age':['44', '32', '27', '29', '31', '25'],
'Salary':['5400', '8500', '7200', '4800', '6200', '10850'],
'Purchased':['yes', 'yes', 'no', 'yes', 'no', 'yes']}
df = pd.DataFrame(data)
X = df.iloc[:,0].values

预期结果是这样的:

|---|---|---|----|-------|---|
| 1 | 0 | 0 | 44 | 5400 | 1 |
| 0 | 1 | 0 | 32 | 8500 | 1 |
| 0 | 0 | 1 | 27 | 7200 | 0 |
| 0 | 1 | 0 | 29 | 4800 | 1 |
| 1 | 0 | 0 | 31 | 6200 | 0 |
| 0 | 1 | 0 | 25 | 10850 | 1 |

这是失败的代码。

from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
ct = ColumnTransformer([("city_category", OneHotEncoder(dtype='int'), [0])], remainder="passthrough")
X = ct.fit_transform(X)

输出:

IndexError: tuple index out of range

我想了解在这种情况下如何使用ColumnTransformer函数?

最佳答案

不需要 sklearn,你可以用 pandas 做到这一点:

import pandas as pd

data = {
"Country": ["Germany", "Turkey", "England", "Turkey", "Germany", "Turkey"],
"Age": ["44", "32", "27", "29", "31", "25"],
"Salary": ["5400", "8500", "7200", "4800", "6200", "10850"],
"Purchased": ["yes", "yes", "no", "yes", "no", "yes"],
}

df = pd.DataFrame(data)
df = pd.concat([pd.get_dummies(df["Country"]), df.drop("Country", axis=1)], axis=1)
df[["Age", "Salary"]] = df[["Age", "Salary"]].astype(int)
df["Purchased"] = df["Purchased"].map(lambda x: x == "yes").astype(int)

print(df.head())

输出为:

   England  Germany  Turkey  Age  Salary  Purchased
0 0 1 0 44 5400 1
1 0 0 1 32 8500 1
2 1 0 0 27 7200 0
3 0 0 1 29 4800 1
4 0 1 0 31 6200 0

关于python - 如何使用 ColumnTransformer 处理分类数据?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56299142/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com