Why does YoloV8 perform poorly when exported to .onnx and run with onnxruntime or opencv dnn? The results just don't compare to torch .pt model files(为什么YoloV8在导出为.onnx并与onnxrun或OpenCV dnn一起运行时表现不佳？结果无法与Torch.pt模型文件相比)-6ren

Why does YoloV8 perform poorly when exported to .onnx and run with onnxruntime or opencv dnn? The results just don't compare to torch .pt model files(为什么YoloV8在导出为.onnx并与onnxrun或OpenCV dnn一起运行时表现不佳？结果无法与Torch.pt模型文件相比)

转载作者：bug小助手更新时间：2023-10-24 19:46:19

I'm working on transfer learning a coco trained yolov8 model to detect objects in an entirely different use case. I get really encouraging performance metrics when I reload the trained model from its model.pt file using the ultralytics library and inbuilt functions.

我正在学习一种经过Coco训练的yolov8模型来检测完全不同用例中的对象。当我使用Ultralytics库和内置函数重新加载训练好的模型时，我得到了非常鼓舞人心的性能指标。

However, I have tried to export the model to .onnx and failed to achieve the same metrics.

然而，我曾尝试将模型导出到.onnx，但未能实现相同的指标。

I trained it on rectangular images of size 1280,720 with the flags rect=True, imgsz=1280

I exported it like this: yolo task=detect mode=export model=runs/detect/last.pt imgsz=720,1280 simplify=true format=onnx opset=12

I tried without an opset, opset11 and opset12 (official docs recommend opset12)

I tried to export it with and without simplify

I've tried to use onnxruntime library using this github repo here as an example

I tried to use this python example by ultralytics themselves

None of the approaches above have given me the same results as using predict.py and loading the model from the original .pt file. Has anyone been able to produce the same results with .onnx as they did with their .pt model and on the gpu? If yes, can you share how you did it?

上面的方法都没有给我带来与使用predict.py并从原始的.pt文件加载模型相同的结果。有没有人能够在.onnx上产生与他们的.pt模型和GPU上相同的结果？如果是，你能分享一下你是如何做到的吗？

更多回答

This is my training command if it helps: yolo detect train data=data/custom.yaml model=yolov8n.pt epochs=100 imgsz=1280 rect=True device=0 batch=8

这是我的训练命令，如果有用的话：YOLO Detect Train Data=data/Custom.yaml Model=yolov8n.pt pechs=100 imgsz=1280 RECT=True Device=0 Batch=8

What is the performance drop you get @moonboi?

你得到的@moonboi的性能下降是什么？

The ultralytics package will resize the image and convert it to a numpy array. You will need to do the same if you haven’t already.

超分析程序包将调整图像的大小并将其转换为数字数组。如果你还没有做到这一点，你也需要这么做。

优秀答案推荐

As Brian Low mentioned you need to handle the pre-processing that is done in python to create the model input.

正如Brian Low提到的，您需要处理在Python中完成的预处理，以创建模型输入。

e.g. from the ultralytics example you linked, the pre-processing is done in these lines: https://github.com/ultralytics/ultralytics/blob/3c88bebc9514a4d7f70b771811ddfe3a625ef14d/examples/YOLOv8-OpenCV-ONNX-Python/main.py#L23C57-L31

例如，从您链接的超分析示例中，在以下几行中完成了预处理：https://github.com/ultralytics/ultralytics/blob/3c88bebc9514a4d7f70b771811ddfe3a625ef14d/examples/YOLOv8-OpenCV-ONNX-Python/main.py#L23C57-L31

Note that this involves a number of steps that are not necessarily trivial and need to be done in the same order as what was done to the input when the model was trained. Resize needs to use anti-aliasing. Normalization needs to use the right range (pytorch is usually float from 0..1, tensorflow is usually float from -1..1). The ordering of the data needs to be correct (channels first vs channels last, RGB vs BGR).

请注意，这涉及许多步骤，这些步骤不一定是微不足道的，并且需要按照与训练模型时对输入所做的相同的顺序来完成。调整大小需要使用抗锯齿。规格化需要使用正确的范围(pytorch通常是从0到1的浮点数，TensorFlow通常是从-1到1的浮点数)。数据的排序需要正确(通道优先与通道最后、RGB与BGR)。

However it is possible to add the pre/post processing to the ONNX model so the input is the bytes from a jpeg or png image. See https://github.com/microsoft/onnxruntime-extensions/blob/main/tutorials/yolo_e2e.py, which happens to be an example for YOLO v8. You will also need the onnxruntime-extensions package for the custom operator that does the image decoding/encoding.

但是，可以将前/后处理添加到ONNX模型中，以便输入的是来自jpeg或png图像的字节。参见https://github.com/microsoft/onnxruntime-extensions/blob/main/tutorials/yolo_e2e.py，，它恰好是YOLOV8的一个例子。您还需要用于执行图像解码/编码的自定义操作符的onnxruntime-扩展包。

The pre/post processing steps provided are composable and configurable so they can be adjusted as needed and used for a wide range of models (image, text, audio).

提供的前/后处理步骤是可组合和可配置的，因此可以根据需要进行调整，并用于各种模型(图像、文本、音频)。

e.g. you may want to handle drawing bounding boxes yourself so you could update the post-processing pipeline to stop after the ScaleBoundingBoxes step here so the model output is the best bounding boxes, with co-ordinates that match the original image pre-resizing: https://github.com/microsoft/onnxruntime-extensions/blob/169438999cf66bd700207441b07c5d26634e72a5/onnxruntime_extensions/tools/add_pre_post_processing_to_model.py#L255-L278

例如，您可能希望自己处理绘制边界框，这样您就可以更新后处理管道，使其在此处的缩放边界框步骤之后停止，以便模型输出是最佳边界框，并具有与原始图像预调大小相匹配的坐标：https://github.com/microsoft/onnxruntime-extensions/blob/169438999cf66bd700207441b07c5d26634e72a5/onnxruntime_extensions/tools/add_pre_post_processing_to_model.py#L255-L278

Facing same issue here. Export it using opset=12 or even without it. Poorly performance when using opencv onnx model.
But the problems seems to sit on opencv. I don't know what happens under the hood.
If I try to use exported onnx model with Ultralytics Yolo it worked perfectly fine.

在这里面临着同样的问题。使用opset=12或甚至不使用它来导出它。使用OpenCV onnx模型时性能较差。但问题似乎出在OpenCV身上。我不知道引擎盖下面发生了什么。如果我尝试使用Ultraalytics Yolo的ONX出口模型，它工作得非常好。

from ultralytics import YOLO

import cv2

model = YOLO("../runs/detect/train/weights/best.onnx")

im2 = cv2.imread("1.png")
results = model.predict(source=im2, save=True, save_txt=True, imgsz=1280)  # save predictions as labels

for result in results:
    boxes = result.boxes  # Boxes object for bbox outputs
    masks = result.masks  # Masks object for segmentation masks outputs
    probs = result.probs  # Class probabilities for classification outputs

Even loading onnx model, it performs exactly as loading .pt model.

即使加载onnx模型，它的性能也与加载.pt模型完全相同。

I had the same issue using "ort.InferenceSession" on javascript, I've also checked whether it was the datatype, thought that model.onnx used lower numerical precision than model.pt, but in fact they use the same datatype (Float32). The problem is therefore because of the image pre-processing.

我在javascript上使用“ort.InferenceSession”也遇到了同样的问题，我还检查了它是否是数据类型，我认为mod.onnx使用的数值精度低于mod.pt，但实际上它们使用相同的数据类型(Float32)。因此，出现这个问题的原因在于图像的预处理。

I have solved it using proportional scaling with padding instead of unproportional image resizing.

我已经解决了它使用比例缩放与填充，而不是不成比例的图像大小。

更多回答