gpt4 book ai didi

python - 估算测试集的缺失值

转载 作者:行者123 更新时间:2023-11-30 09:24:44 25 4
gpt4 key购买 nike

我用了adult data here ,估算训练数据的缺失值,而我想将从训练数据中获得的相同数字应用于测试数据。我一定错过了什么,却无法把它做好。我的代码如下:

import numpy as np
import pandas as pd
from sklearn.base import TransformerMixin
train = pd.read_csv("train.csv")
test = pd.read_csv("test.csv")

features = ['age','workclass','fnlwgt','education','educationNum','maritalStatus','occupation','relationship','race','sex','capitalGain','capitalLoss','hoursPerWeek','nativeCountry']

x_train = train[list(features)]
y_train = train['class']
x_test = test[list(features)]
y_test = test['class']

class DataFrameImputer(TransformerMixin):
def _init_(self):
"""Impute missing values.
Columns of dtype object are imputed with the most frequent value in column.
columns of other types are imputed with mean of column"""
def fit(self, X, y=None):
self.fill = pd.Series([X[c].value_counts().index[0]
if X[c].dtype == np.dtype('O') else X[c].mean() for c in X],
index=X.columns)
return self

def transform(self, X, y=None):
return X.fillna(self.fill)


# 2 step transformation, fit and transform
# -------Impute missing values-------------

x_train = pd.DataFrame(x_train) # x_train is class
x_test = pd.DataFrame(x_test)
x_train_new = DataFrameImputer().fit_transform(x_train)
x_train_new = pd.DataFrame(x_train_new)
# use same value fitted training data to fit test data

for c in x_test:
if x_test[c].dtype==np.dtype('O'):
x_test.fillna(x_train[c].value_counts().index[0])
else:
x_test.fillna(x_train[c].mean(),inplace=True)

最佳答案

我们想要使用从训练数据中获得的数据,将其应用于测试数据,在前面的代码中,循环不起作用,第一列是一列数字,因此它将填充所有 NaN在测试数据中作为训练数据第一列的平均值。相反,如果我将 fillna 与值一起使用,这里的值是一个字典,测试数据将根据类别匹配训练数据。

values = {} #declare dict
for c in x_train:
if x_train[c].dtype==np.dtype('O'):
values[c]=x_train[c].value_counts().index[0]
else:
values[c]=x_train[c].mean()
values.update({c:values[c]})

x_test_new = x_test.fillna(value=values)

关于python - 估算测试集的缺失值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48495372/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com