gpt4 book ai didi

python - 我如何使用列转换器获取_feature_names

转载 作者:行者123 更新时间:2023-12-04 11:26:55 26 4
gpt4 key购买 nike

import numpy as np
import pandas as pd
from sklearn.preprocessing import OneHotEncoder,StandardScaler
from sklearn.compose import ColumnTransformer, make_column_transformer
from sklearn.linear_model import LinearRegression

df = pd.DataFrame({'brand' : ['aaaa', 'asdfasdf', 'sadfds', 'NaN'],
'category' : ['asdf','asfa','asdfas','as'],
'num1' : [1, 1, 0, 0] ,
'target' : [0.2,0.11,1.34,1.123]})



train_continuous_cols = df.select_dtypes(include=["int64","float64"]).columns.tolist()
train_categorical_cols = df.select_dtypes(include=["object"]).columns.tolist()


preprocess = make_column_transformer(
(StandardScaler(),train_continuous_cols),
(OneHotEncoder(), train_categorical_cols)
)
df= preprocess.fit_transform(df)

只是想获取所有功能名称:
preprocess.get_feature_names()

收到此错误:
Transformer standardscaler (type StandardScaler) does not provide get_feature_names

我该如何解决?在线示例使用管道,我试图避免这种情况。

最佳答案

以下对 ColumnTransformer 的重新实现返回一个 Pandas DataFrame。请注意,仅当您将 Pandas DataFrame 输入管道时才应使用它。
所有荣誉都归于提供 get_feature_names() 的 Johannes Haupt对没有此功能的转换器具有弹性的功能(请参阅博客文章 Extracting Column Names from the ColumnTransformer)。我对警告进行了评论,因为我不想要它们,并且还预先将转换步骤添加到列名中;但是可以很容易地根据需要取消评论。

#import warnings
import sklearn
import pandas as pd

class ColumnTransformerWithNames(ColumnTransformer):


def get_feature_names(column_transformer):
"""Get feature names from all transformers.
Returns
-------
feature_names : list of strings
Names of the features produced by transform.
"""
# Remove the internal helper function
#check_is_fitted(column_transformer)

# Turn loopkup into function for better handling with pipeline later
def get_names(trans):
# >> Original get_feature_names() method
if trans == 'drop' or (
hasattr(column, '__len__') and not len(column)):
return []
if trans == 'passthrough':
if hasattr(column_transformer, '_df_columns'):
if ((not isinstance(column, slice))
and all(isinstance(col, str) for col in column)):
return column
else:
return column_transformer._df_columns[column]
else:
indices = np.arange(column_transformer._n_features)
return ['x%d' % i for i in indices[column]]
if not hasattr(trans, 'get_feature_names'):
# >>> Change: Return input column names if no method avaiable
# Turn error into a warning
# warnings.warn("Transformer %s (type %s) does not "
# "provide get_feature_names. "
# "Will return input column names if available"
# % (str(name), type(trans).__name__))
# For transformers without a get_features_names method, use the input
# names to the column transformer
if column is None:
return []
else:
return [#name + "__" +
f for f in column]

return [#name + "__" +
f for f in trans.get_feature_names()]

### Start of processing
feature_names = []

# Allow transformers to be pipelines. Pipeline steps are named differently, so preprocessing is needed
if type(column_transformer) == sklearn.pipeline.Pipeline:
l_transformers = [(name, trans, None, None) for step, name, trans in column_transformer._iter()]
else:
# For column transformers, follow the original method
l_transformers = list(column_transformer._iter(fitted=True))


for name, trans, column, _ in l_transformers:
if type(trans) == sklearn.pipeline.Pipeline:
# Recursive call on pipeline
_names = column_transformer.get_feature_names(trans)
# if pipeline has no transformer that returns names
if len(_names)==0:
_names = [#name + "__" +
f for f in column]
feature_names.extend(_names)
else:
feature_names.extend(get_names(trans))

return feature_names

def transform(self, X):
indices = X.index.values.tolist()
original_columns = X.columns.values.tolist()
X_mat = super().transform(X)
new_cols = self.get_feature_names()
new_X = pd.DataFrame(X_mat.toarray(), index=indices, columns=new_cols)
return new_X

def fit_transform(self, X, y=None):
super().fit_transform(X, y)
return self.transform(X)
然后你可以替换对 ColumnTransformer 的调用至 ColumnTransformerWithNames .输出是一个数据帧,这一步现在有一个工作 get_feature_names() .

关于python - 我如何使用列转换器获取_feature_names,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61079602/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com