gpt4 book ai didi

python - 如何从 sklearn 转换(Imputer)中保留 DataFrame 中的数据类型

转载 作者:行者123 更新时间:2023-12-01 08:12:51 24 4
gpt4 key购买 nike

我有以下数据。

+----+-------------+----------+--------+------+-------+-------+---------+
| ID | PassengerId | Survived | Pclass | Age | SibSp | Parch | Fare |
+----+-------------+----------+--------+------+-------+-------+---------+
| 0 | 1 | 0 | 3 | 22.0 | 1 | 0 | 7.2500 |
| 1 | 2 | 1 | 1 | 38.0 | 1 | 0 | 71.2833 |
| 2 | 3 | 1 | 3 | 26.0 | 0 | 0 | 7.9250 |
| 3 | 4 | 1 | 1 | 35.0 | 1 | 0 | 53.1000 |
| 4 | 5 | 0 | 3 | 35.0 | 0 | 0 | 8.0500 |
| 5 | 6 | 0 | 3 | NaN | 0 | 0 | 8.4583 |
+----+-------------+----------+--------+------+-------+-------+---------+

转换后(通过插补),数据类型假设从 int/bool 更改为 float。

+----+-------------+----------+--------+-----------+-------+-------+---------+
| ID | PassengerId | Survived | Pclass | Age | SibSp | Parch | Fare |
+----+-------------+----------+--------+-----------+-------+-------+---------+
| 0 | 1.0 | 0.0 | 3.0 | 22.000000 | 1.0 | 0.0 | 7.2500 |
| 1 | 2.0 | 1.0 | 1.0 | 38.000000 | 1.0 | 0.0 | 71.2833 |
| 2 | 3.0 | 1.0 | 3.0 | 26.000000 | 0.0 | 0.0 | 7.9250 |
| 3 | 4.0 | 1.0 | 1.0 | 35.000000 | 1.0 | 0.0 | 53.1000 |
| 4 | 5.0 | 0.0 | 3.0 | 35.000000 | 0.0 | 0.0 | 8.0500 |
| 5 | 6.0 | 0.0 | 3.0 | 28.000000 | 0.0 | 0.0 | 8.4583 |
+----+-------------+----------+--------+-----------+-------+-------+---------+

我的代码如下:

import pandas as pd
import numpy as np

#https://www.kaggle.com/shivamp629/traincsv/downloads/traincsv.zip/1
data = pd.read_csv("train.csv")

data2 = data[['PassengerId', 'Survived','Pclass','Age','SibSp','Parch','Fare']].copy()

from sklearn.preprocessing import Imputer

fill_NaN = Imputer(missing_values=np.nan, strategy='median', axis=0)
data2_im = pd.DataFrame(fill_NaN.fit_transform(data2), columns = data2.columns)

data2_im

有办法保留数据类型吗?感谢您的帮助。

最佳答案

无法保留数据类型,因为出于性能原因,sklearn 在转换之前从 data2 中提取基础数据,并将数据类型同质化为 float 数据。

您始终可以使用 astype 恢复初始数据类型:

v = fill_NaN.fit_transform(data2)
df = pd.DataFrame(v, columns=data2.columns).astype(data2.dtypes.to_dict())
df

PassengerId Survived Pclass Age SibSp Parch Fare
0 1 0 3 22.0 1 0 7.2500
1 2 1 1 38.0 1 0 71.2833
2 3 1 3 26.0 0 0 7.9250
3 4 1 1 35.0 1 0 53.1000
4 5 0 3 35.0 0 0 8.0500
5 6 0 3 35.0 0 0 8.4583

关于python - 如何从 sklearn 转换(Imputer)中保留 DataFrame 中的数据类型,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55131799/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com