gpt4 book ai didi

azure - auto-ml-forecasting-many-models 检索训练结果

转载 作者:行者123 更新时间:2023-12-03 03:35:02 28 4
gpt4 key购买 nike

我正在按照 Python SDK 教程来训练许多模型:这里是 link到笔记本

一切正常,但现在我对训练结果感兴趣。当我检查 Azure ML Studio 时,我可以看到管道步骤如下:training pipeline .

在 mm-models train 的输出目录中,我没有得到我需要的信息。但如果我检查子作业,我会按粒度下降 child-jobs .

在这里,我可以看到经过训练的模型及其超参数,例如我想要预测的分区实体。例如某个地区的特定产品。 models for a partitioned entity

现在我想要的是从 Python SDK(或一般以编程方式)检索存储在该子作业中的信息,我可以通过 Azure ML Studio GUI 访问这些信息,例如交叉验证预测或可解释性度量,例如这些:Azure ML GUI .

我可以看到此信息存储在子作业的输出文件夹中:Output + logs

我无法从Python SDK的文档中弄清楚如何检索这些结果。我还尝试通过 Rest API 从子作业中获取结果 thread 。我得到的输出如下所示:error response此外,我不知道如何从 python SDK 中获取子作业的所有 ID。

如果有任何帮助,我将不胜感激!

最佳答案

此类ManyModelLocalInferencing提供了在使用ParallelRunStep训练的许多模型(预测)上下载和运行推理的方法。

构造函数

  • init_(self,experiment,training_run_id,cv_output_path,inference_output_path):使用以下参数初始化类:
    • 实验:与训练运行相关联的实验对象。
    • training_run_id:与多个模型关联的训练运行的 ID。cv_output_path:交叉验证结果和下载模型的保存路径。
    • inference_output_path:推理结果保存的路径。

方法

  • download_best_models(self):根据交叉验证结果下载每个时间序列对象的最佳模型。最好的模型保存在 cv_output_path 目录中。此方法还将下载模型的摘要保存在 cv_output_path 目录中。
  • local_inferencing(self, test_set: pd.DataFrame):使用输入测试集对下载的模型运行推理。生成的预测、分位数和滚动预测结果保存在 inference_output_path 目录中。

属性

  • models_downloaded:一个 bool 值,指示模型是否已下载。

外部依赖

  • os:用于与文件系统交互的 Python 内置模块。
  • json:用于处理 JSON 数据的 Python 内置模块。
  • pandas:开源数据分析和操作工具。
  • azureml.pipeline.core.PipelineRun:用于管理管道运行的 Azure 机器学习 Python SDK 模块。
  • sklearn.externals.joblib:一组在 Python 中提供轻量级流水线的工具,特别是对于大型 numpy 数组。

注意该文档假设该类在更大的上下文中使用,其中定义了以下变量:

  • time_series_id_column_names:一个字符串,表示包含时间序列 ID 的列的名称。
  • label_column_name:一个字符串,表示包含标签的列的名称。
import os
import json
import pandas as pd
from azureml.pipeline.core import PipelineRun
from sklearn.externals import joblib


class ManyModelLocalInferencing:

def __init__(self, experiment, training_run_id, cv_output_path, inference_output_path):
"""
Initializes the ManyModelLocalInferencing class.

Parameters:
experiment (azureml.core.Experiment): The experiment object that contains the training run.
training_run_id (str): The ID of the training run.
cv_output_path (str): The path to the directory where the best models will be saved.
inference_output_path (str): The path to the directory where the results of local inferencing will be saved.
"""
self.experiment = experiment
self.training_run_id = training_run_id
self.cv_output_path = cv_output_path
self.inference_output_path = inference_output_path
self.models_downloaded = False
self.results = dict()

def download_best_models(self):
"""
Downloads the best models for each time series object from the Azure ML run and saves them locally.
"""
# get the many models training pipeline run
pipeline = PipelineRun(experiment=self.experiment, run_id=self.training_run_id)

# find the child runs for the many models training step
many_models_runs = []
many_models_train_step = pipeline.find_step_run("many-models-train")[0]
for run_name in many_models_train_step.get_children():
many_models_runs.append(run_name)


# create the output directories if they don't exist
for path in [self.cv_output_path, self.inference_output_path]:
os.makedirs(path, exist_ok=True)
print(f"a new directory '{path}' is created!")

# download the best models for each time series object
summary = []
for run in many_models_runs:
best_model = run.get_best_child()

try:
best_model.download_file("forecast_table", output_file_path=self.cv_output_path)
with open(f'{self.cv_output_path}/forecast_table', "r") as f:
data = json.load(f)
except Exception as e:
print(f"Error downloading for run {run.id}: {str(e)}")
continue

grain_names = '_'.join(data.get('data')[0].get('grain_value_list')[0])
run_preprocessor = best_model.properties['run_preprocessor']
run_algorithm = best_model.properties['run_algorithm']
score = best_model.properties["score"]

# check if there is a previous best model for this time series object
previous_summary = pd.DataFrame(
summary, columns=[time_series_id_column_names,"preprocessor","algorithm","score"]
).query(f"{time_series_id_column_names} == '{grain_names}'")
if not previous_summary.empty:
previous_score = previous_summary.score.min()
else:
previous_score = None

# download the best model if it is better than the previous best model
if previous_score is None or score < previous_score or not os.path.exists(f"{self.cv_output_path}/{grain_names}"):
try:
best_model.download_files(output_directory=f"{self.cv_output_path}/{grain_names}")
except Exception as e:
print(f"Error downloading model files for run {run.id}: {str(e)}")
continue

summary.append({
time_series_id_column_names:grain_names,
"preprocessor": run_preprocessor,
"algorithm": run_algorithm,
"score": score
})

# save the summary to a file
summary_df = pd.DataFrame(summary).groupby(time_series_id_column_names).min()
summary_df.to_csv(f"{self.cv_output_path}/summary.csv")
os.remove(f'{self.cv_output_path}/forecast_table')

self.models_downloaded = True

def local_inferencing(self, test_set: pd.DataFrame):
if not self.models_downloaded:
print(
"Models have not been downloaded. Calling download_best_models first."
)
self.download_best_models()

forecast_results = pd.DataFrame()
quantile_result_list = []
rolling_result_list = []

for sku in test_set[time_series_id_column_names].unique():

test_ = test_set[test_set[time_series_id_column_names] == sku]
fitted_model = joblib.load(f"{self.cv_output_path}/{sku}/outputs/model.pkl")
model_response = fitted_model.forecast(X_pred=test_)[1]
model_response[label_column_name] = test_[label_column_name].values
forecast_results = forecast_results.append(model_response)

X_test = test_set[test_set[time_series_id_column_names] == sku]
y_test = X_test.pop(label_column_name).values

fitted_model.quantiles = [0.05, 0.5, 0.95]
quantile_result_list.append(fitted_model.forecast_quantiles(
X_test
))

# Make a rolling forecast, advancing the forecast origin by 1 period on each iteration through the test set
rolling_result_list.append(fitted_model.rolling_forecast(
X_test, y_test, step=1, ignore_data_errors=True
))

self.results["forecast_results"] = forecast_results
self.results["quantile_results"] = pd.concat(quantile_result_list, sort=False, ignore_index=True)
self.results["rolling_results"] = pd.concat(rolling_result_list, sort=False, ignore_index=True)

for key, value in self.results.items():
print(f'saving {key}')
value.to_csv(f"{self.inference_output_path}/{key}.csv")

关于azure - auto-ml-forecasting-many-models 检索训练结果,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/73652143/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com