gpt4 book ai didi

machine-learning - Amazon Sagemaker 对象检测预期批量错误数量

转载 作者:行者123 更新时间:2023-11-30 09:16:36 24 4
gpt4 key购买 nike

在训练模型超过 1 个周期时出现以下错误

[02/06/2019 13:37:08 WARNING 140231582721856] Expected number of batches: 15, did not match the number of batches processed: 16. This may happen when some images or annotations are invalid and cannot be parsed. Please check the dataset and ensure it follows the format in the documentation.
[02/06/2019 13:37:08 INFO 140231582721856] #quality_metric: host=algo-1, epoch=24, batch=16 train cross_entropy <loss>=(nan)
[02/06/2019 13:37:08 INFO 140231582721856] #quality_metric: host=algo-1, epoch=24, batch=16 train smooth_l1 <loss>=(nan)
[02/06/2019 13:37:08 INFO 140231582721856] Round of batches complete
[02/06/2019 13:37:08 INFO 140231582721856] Updated the metrics
[02/06/2019 13:37:08 INFO 140231582721856] #quality_metric: host=algo-1, epoch=24, validation mAP <score>=(0.0)
[02/06/2019 13:37:08 INFO 140231582721856] #progress_metric: host=algo-1, completed 83 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 0, "sum": 0.0, "min": 0}, "Number of Batches Since Last Reset": {"count": 1, "max": 0, "sum": 0.0, "min": 0}, "Number of Records Since Last Reset": {"count": 1, "max": 0, "sum": 0.0, "min": 0}, "Total Batches Seen": {"count": 1, "max": 0, "sum": 0.0, "min": 0}, "Total Records Seen": {"count": 1, "max": 0, "sum": 0.0, "min": 0}, "Max Records Seen Between Resets": {"count": 1, "max": 0, "sum": 0.0, "min": 0}, "Reset Count": {"count": 1, "max": 25, "sum": 25.0, "min": 25}}, "EndTime": 1549460228.963195, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/Object Detection", "epoch": 24}, "StartTime": 1549460224.644808}

以下是我使用的代码

对于估算器

od_model = sagemaker.estimator.Estimator(training_image,
role,
train_instance_count=1,
train_instance_type='ml.p3.8xlarge',
train_volume_size = 500,
train_max_run = 300000,
input_mode= 'File',
output_path=s3_output_location,
sagemaker_session=sess)

对于超参数

od_model.set_hyperparameters(base_network='resnet-50',
use_pretrained_model=0,
num_classes=1,
mini_batch_size=32,
epochs=30,
learning_rate=0.001,
lr_scheduler_step='3,6',
lr_scheduler_factor=0.1,
optimizer='sgd',
momentum=0.9,
weight_decay=0.0005,
overlap_threshold=0.5,
nms_threshold=0.45,
image_shape=512,
label_width=360,
num_training_samples=500)

但是如果我将纪元保持为 1,边界框看起来很好,尽管输出模型无法正确检测并在各处创建框

使用上述代码,最终模型不会创建任何边界框

最佳答案

两个训练损失均为“nan”,验证 mAP 为 0。这意味着模型未正确训练。尝试调整“learning_rate”和“batch_size”超参数。对于小型数据集(500 张图像),您可以通过设置“use_pretrained_model=1”来使用迁移学习功能。

关于machine-learning - Amazon Sagemaker 对象检测预期批量错误数量,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54556605/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com