gpt4 book ai didi

python-3.x - Scikitlearn 列变压器错误 : Column ordering must be equal for fit and for transform when using the remainder keyword

转载 作者:行者123 更新时间:2023-12-04 21:28:55 26 4
gpt4 key购买 nike

我有一个使用 ColumnTransformer 的带有管道的简单模型

我能够训练模型并将模型保存为泡菜

当我加载泡菜并预测实时数据时,我收到以下关于 ColumnTransformer 的错误

使用剩余关键字时,列顺序对于拟合和变换必须相等

训练数据和用于预测的数据具有完全相同的列数,例如 50。我不确定该列的“排序”如何改变。

为什么列的排序对于 columntransformer 很重要?
如何解决这个问题?有没有办法在运行柱式变压器后确保“排序”?

谢谢。

   pipeline = Pipeline([
('RepalceInf', ReplaceInf()),
('impute_30_100', ColumnTransformer(
[
('oneStdNorm', OneStdImputer(), self.cont_feature_strategy_dict['FEATS_30_100']),
],
remainder='passthrough'
)),
('regress_impute', IterativeImputer(random_state=0, estimator=self.cont_estimator)),
('replace_outlier', OutlierReplacer(quantile_range=(1, 99))),
('scaler', StandardScaler(with_mean=True))
])



class OneStdImputer(TransformerMixin, BaseEstimator):
def __init__(self):
"""
Impute the missing data with random value in the range of mean +/- one standard deviation
This is a simplified implementation without sparse/dense fit and check.
"""
self.mean = None
self.std = None

def fit(self, X, y=None):
self.mean = X.mean()
self.std = X.std()
return self

def transform(self, X):
# X_imp = X.fillna(np.random.randint()*2*self.std+self.mean-self.std)
for col in X:
self._fill_randnorm(X[col], col)
return X

def _fill_randnorm(self, df, col):
val = df.values
mask = np.isnan(df)
mu, sigma = self.mean[col], self.std[col]
val[mask] = np.random.normal(mu, sigma, size=mask.sum())
return df

最佳答案

您可以使用 df_new =pd.DataFrame(df_origin, columns=df_train.columns确保要预测的数据具有相同的 栏目 与训练数据。

从给定的例子来看,很明显 ColumnTransformer会走订单号 选择的列作为标记进行处理。(虽然您可以使用精确名称来选择列,但我认为它也会转换为数字)

>>> import numpy as np
>>> from sklearn.compose import ColumnTransformer
>>> from sklearn.preprocessing import Normalizer
>>> ct = ColumnTransformer(
... [("norm1", Normalizer(norm='l1'), [0, 1]),
... ("norm2", Normalizer(norm='l1'), slice(2, 4))])
>>> X = np.array([[0., 1., 2., 2.],
... [1., 1., 0., 1.]])
>>> # Normalizer scales each row of X to unit norm. A separate scaling
>>> # is applied for the two first and two last elements of each
>>> # row independently.
>>> ct.fit_transform(X)
array([[0. , 1. , 0.5, 0.5],
[0.5, 0.5, 0. , 1. ]])

关于python-3.x - Scikitlearn 列变压器错误 : Column ordering must be equal for fit and for transform when using the remainder keyword,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58341289/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com