gpt4 book ai didi

python - 在 sklearn 的 .fit() 方法中使用 numpy.ndarray 与 Pandas Dataframe

转载 作者:行者123 更新时间:2023-12-01 01:16:35 25 4
gpt4 key购买 nike

我正在对我的数据使用逻辑回归模型。据我了解(例如,从这里: Pandas vs. Numpy Dataframes ),最好将 numpy.ndarray 与 sklearn 一起使用,而不是使用 Pandas Dataframes。这可以通过使用数据帧上的 .values 属性来完成。我已经这样做了,但得到了 ValueError:仅 pandas DataFrames 支持使用字符串指定列。显然,我的代码做错了。非常感谢任何见解。

有趣的是,当我不使用 .values,并且仅使用 X 作为 DataFrame 和 y 作为 Pandas Series 时,我的代码可以正常工作。

# We will train our classifier with the following features:
# Numeric features to be scaled: LIMIT_BAL, AGE, PAY_X, BIL_AMTX, and PAY_AMTX
# Categorical features: SEX, EDUCATION, MARRIAGE

# We create the preprocessing pipelines for both numeric and categorical data
numeric_features = ['LIMIT_BAL', 'AGE', 'PAY_1', 'PAY_2', 'PAY_3', 'PAY_4', 'PAY_5', 'PAY_6',
'BILL_AMT1', 'BILL_AMT2', 'BILL_AMT3', 'BILL_AMT4', 'BILL_AMT5', 'BILL_AMT6',
'PAY_AMT1', 'PAY_AMT2', 'PAY_AMT3', 'PAY_AMT4', 'PAY_AMT5', 'PAY_AMT6']

data['PAY_1'] = data.PAY_1.astype('float64')
data['PAY_2'] = data.PAY_2.astype('float64')
data['PAY_3'] = data.PAY_3.astype('float64')
data['PAY_4'] = data.PAY_4.astype('float64')
data['PAY_5'] = data.PAY_5.astype('float64')
data['PAY_6'] = data.PAY_6.astype('float64')
data['AGE'] = data.AGE.astype('float64')


numeric_transformer = Pipeline(steps=[
('scaler', MinMaxScaler())
])

categorical_features = ['SEX', 'EDUCATION', 'MARRIAGE']
categorical_transformer = Pipeline(steps=[
('onehot', OneHotEncoder(categories='auto'))
])

preprocessor = ColumnTransformer(
transformers=[
('num', numeric_transformer, numeric_features),
('cat', categorical_transformer, categorical_features)
])

y = data['default'].values
X = data.drop('default', axis=1).values

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,
random_state=10, stratify=y)

# Append classifier to preprocessing pipeline.
# Now we have a full prediction pipeline.
lr = Pipeline(steps=[('preprocessor', preprocessor),
('classifier', LogisticRegression(solver='liblinear'))])

param_grid_lr = {
'classifier__C': np.logspace(-5, 8, 15)
}

lr_cv = GridSearchCV(lr, param_grid_lr, cv=10, iid=False)

lr_cv.fit(X_train, y_train)

ValueError:仅 pandas DataFrame 支持使用字符串指定列

最佳答案

您正在使用ColumnTransformer,就像您有一个数据框一样,但您没有......

column(s) : string or int, array-like of string or int, slice, boolean mask array or callable

Indexes the data on its second axis. Integers are interpreted as positional columns, while strings can reference DataFrame columns by name. A scalar string or int should be used where transformer expects X to be a 1d array-like (vector), otherwise a 2d array will be passed to the transformer. A callable is passed the input data X and can return any of the above.

如果您传递列的字符串,则需要传递数据帧。如果您想使用 numpy 数组,那么首先可能不需要转换,并且您需要指定整数而不是字符串作为索引。

关于python - 在 sklearn 的 .fit() 方法中使用 numpy.ndarray 与 Pandas Dataframe,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54293805/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com