gpt4 book ai didi

google-cloud-platform - XGboost Google-AI-Model 期望浮点值而不是使用分类值并转换它们

转载 作者:行者123 更新时间:2023-12-05 07:15:02 24 4
gpt4 key购买 nike

我正在尝试使用这个简单示例运行基于 Google Cloud 的简单 XGBoost 预测 https://cloud.google.com/ml-engine/docs/scikit/getting-predictions-xgboost#get_online_predictions

模型构建良好,但当我尝试使用示例输入 JSON 运行预测时,它失败并显示错误“无法从输入初始化 DMatrix:无法将字符串转换为 float :”作为如下图所示。我知道这是因为测试输入有字符串,我希望谷歌机器学习模型应该有信息将分类值转换为 float 。我不能指望我的用户提交带有浮点值的在线预测请求。

根据教程,它应该可以在不将分类值转换为 float 的情况下工作。请告知,我附上了包含更多详细信息的 GIF。谢谢

enter image description here

import json
import numpy as np
import os
import pandas as pd
import pickle
import xgboost as xgb
from sklearn.preprocessing import LabelEncoder

# these are the column labels from the census data files
COLUMNS = (
'age',
'workclass',
'fnlwgt',
'education',
'education-num',
'marital-status',
'occupation',
'relationship',
'race',
'sex',
'capital-gain',
'capital-loss',
'hours-per-week',
'native-country',
'income-level'
)

# categorical columns contain data that need to be turned into numerical
# values before being used by XGBoost
CATEGORICAL_COLUMNS = (
'workclass',
'education',
'marital-status',
'occupation',
'relationship',
'race',
'sex',
'native-country'
)

# load training set
with open('./census_data/adult.data', 'r') as train_data:
raw_training_data = pd.read_csv(train_data, header=None, names=COLUMNS)
# remove column we are trying to predict ('income-level') from features list
train_features = raw_training_data.drop('income-level', axis=1)
# create training labels list
train_labels = (raw_training_data['income-level'] == ' >50K')


# load test set
with open('./census_data/adult.test', 'r') as test_data:
raw_testing_data = pd.read_csv(test_data, names=COLUMNS, skiprows=1)
# remove column we are trying to predict ('income-level') from features list
test_features = raw_testing_data.drop('income-level', axis=1)
# create training labels list
test_labels = (raw_testing_data['income-level'] == ' >50K.')

# convert data in categorical columns to numerical values
encoders = {col:LabelEncoder() for col in CATEGORICAL_COLUMNS}
for col in CATEGORICAL_COLUMNS:
train_features[col] = encoders[col].fit_transform(train_features[col])
for col in CATEGORICAL_COLUMNS:
test_features[col] = encoders[col].fit_transform(test_features[col])

# load data into DMatrix object
dtrain = xgb.DMatrix(train_features, train_labels)
dtest = xgb.DMatrix(test_features)

# train XGBoost model
bst = xgb.train({}, dtrain, 20)
bst.save_model('./model.bst')

最佳答案

这是一个修复。将 Google 文档中显示的输入放入文件 input.json 中,然后运行它。输出为 input_numerical.json,如果您使用它代替 input.json,预测将会成功。

此代码只是使用与训练和测试数据相同的过程将分类列预处理为数字形式。

import json

import pandas as pd
from sklearn.preprocessing import LabelEncoder

COLUMNS = (
"age",
"workclass",
"fnlwgt",
"education",
"education-num",
"marital-status",
"occupation",
"relationship",
"race",
"sex",
"capital-gain",
"capital-loss",
"hours-per-week",
"native-country",
"income-level",
)

# categorical columns contain data that need to be turned into numerical
# values before being used by XGBoost
CATEGORICAL_COLUMNS = (
"workclass",
"education",
"marital-status",
"occupation",
"relationship",
"race",
"sex",
"native-country",
)

with open("./input.json", "r") as json_lines:
rows = [json.loads(line) for line in json_lines]

prediction_features = pd.DataFrame(rows, columns=(COLUMNS[:-1]))

encoders = {col: LabelEncoder() for col in CATEGORICAL_COLUMNS}
for col in CATEGORICAL_COLUMNS:
prediction_features[col] = encoders[col].fit_transform(prediction_features[col])

with open("input_numerical.json", "w") as input_numerical:
for index, row in prediction_features.iterrows():
input_numerical.write(row.to_json(orient="values") + "\n")

我创建了 this Google Issues Tracker ticket因为 Google 文档缺少这一重要步骤。

关于google-cloud-platform - XGboost Google-AI-Model 期望浮点值而不是使用分类值并转换它们,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59753888/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com