gpt4 book ai didi

python - XGBoostError : Check failed: typestr. size() == 3(2 对 3): `typestr' 的格式应为

转载 作者:行者123 更新时间:2023-12-05 02:43:02 25 4
gpt4 key购买 nike

我在新安装 xgboost 时遇到了一个奇怪的问题。在正常情况下,它工作正常。但是,当我在以下函数中使用该模型时,它会在标题中给出错误。

我使用的数据集是从kaggle借来的,可以在这里看到:https://www.kaggle.com/kemical/kickstarter-projects

我用来拟合模型的函数如下:

def get_val_scores(model, X, y, return_test_score=False, return_importances=False, random_state=42, randomize=True, cv=5, test_size=0.2, val_size=0.2, use_kfold=False, return_folds=False, stratify=True):
print("Splitting data into training and test sets")
if randomize:
if stratify:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=test_size, stratify=y, shuffle=True, random_state=random_state)
else:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=test_size, shuffle=True, random_state=random_state)
else:
if stratify:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=test_size, stratify=y, shuffle=False)
else:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=test_size, shuffle=False)
print(f"Shape of training data, X: {X_train.shape}, y: {y_train.shape}. Test, X: {X_test.shape}, y: {y_test.shape}")
if use_kfold:
val_scores = cross_val_score(model, X=X_train, y=y_train, cv=cv)
else:
print("Further splitting training data into validation sets")
if randomize:
if stratify:
X_train_, X_val, y_train_, y_val = train_test_split(X_train, y_train, test_size=val_size, stratify=y_train, shuffle=True)
else:
X_train_, X_val, y_train_, y_val = train_test_split(X_train, y_train, test_size=val_size, shuffle=True)
else:
if stratify:
print("Warning! You opted to both stratify your training data and to not randomize it. These settings are incompatible with scikit-learn. Stratifying the data, but shuffle is being set to True")
X_train_, X_val, y_train_, y_val = train_test_split(X_train, y_train, test_size=val_size, stratify=y_train, shuffle=True)
else:
X_train_, X_val, y_train_, y_val = train_test_split(X_train, y_train, test_size=val_size, shuffle=False)
print(f"Shape of training data, X: {X_train_.shape}, y: {y_train_.shape}. Val, X: {X_val.shape}, y: {y_val.shape}")
print("Getting ready to fit model.")
model.fit(X_train_, y_train_)
val_score = model.score(X_val, y_val)

if return_importances:
if hasattr(model, 'steps'):
try:
feats = pd.DataFrame({
'Columns': X.columns,
'Importance': model[-2].feature_importances_
}).sort_values(by='Importance', ascending=False)
except:
model.fit(X_train, y_train)
feats = pd.DataFrame({
'Columns': X.columns,
'Importance': model[-2].feature_importances_
}).sort_values(by='Importance', ascending=False)
else:
try:
feats = pd.DataFrame({
'Columns': X.columns,
'Importance': model.feature_importances_
}).sort_values(by='Importance', ascending=False)
except:
model.fit(X_train, y_train)
feats = pd.DataFrame({
'Columns': X.columns,
'Importance': model.feature_importances_
}).sort_values(by='Importance', ascending=False)

mod_scores = {}
try:
mod_scores['validation_score'] = val_scores.mean()
if return_folds:
mod_scores['fold_scores'] = val_scores
except:
mod_scores['validation_score'] = val_score

if return_test_score:
mod_scores['test_score'] = model.score(X_test, y_test)

if return_importances:
return mod_scores, feats
else:
return mod_scores

我遇到的奇怪部分是,如果我在 sklearn 中创建一个管道,它会在函数外部的数据集上运行,但不会在函数内部运行。例如:

from sklearn.pipeline import make_pipeline
from category_encoders import OrdinalEncoder
from xgboost import XGBClassifier

pipe = make_pipeline(OrdinalEncoder(), XGBClassifier())

X = df.drop('state', axis=1)
y = df['state']

在这种情况下,pipe.fit(X, y) 工作得很好。但是 get_val_scores(pipe, X, y) 失败并在标题中显示错误消息。更奇怪的是 get_val_scores(pipe, X, y) 似乎适用于其他数据集,例如 Titanic。该错误发生在模型拟合 X_trainy_train 时。

在这种情况下,损失函数是 binary:logisticstate 列的值为 successfulfailed.

最佳答案

xgboost 库目前正在更新以修复此错误,因此当前的解决方案是将库降级到旧版本,对我来说,我已经通过降级到 xgboost v0.90 解决了这个问题

尝试通过cmd检查你的xgboost版本:

python 

import xgboost

print(xgboost.__version__)

exit()

如果版本不是 0.90,则通过以下方式卸载当前版本:

pip uninstall xgboost

安装 xgboost 0.90 版本

pip install xgboost==0.90

再次运行您的代码!

关于python - XGBoostError : Check failed: typestr. size() == 3(2 对 3): `typestr' 的格式应为 <endian><type><size of type in bytes>,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/67095097/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com