amazon-web-services - Tensorflow 对象检测 API 的推理时间较慢-6ren

amazon-web-services - Tensorflow 对象检测 API 的推理时间较慢

转载作者：行者123 更新时间：2023-12-05 06:36:07

我一直在使用 Tensorflow 对象检测 API - 在我的例子中，我尝试使用模型动物园中的 kitti 训练模型 (faster_rcnn_resnet101_kitti_2018_01_28) 检测静止图像中的车辆，我使用的代码修改自object_detection_tutorial jupyter notebook 包含在 github 存储库中。

我在下面包含了修改后的代码，但发现与来自 github 的原始笔记本的结果相同。

当在具有深度学习 AMI 的 Amazon AWS g3x4large (GPU) 实例上的 jupyter notebook 服务器上运行时，处理单个图像只需将近 4 秒。推理函数的时间是 1.3-1.5 秒(见下面的代码)——对于报告的模型推理时间(20 毫秒)来说，这似乎异常高。虽然我不希望达到报告的标记，但我的时间似乎不合时宜并且不符合我的需要。我打算一次处理超过 100 万张图像，但无法承受 46 天的处理时间。鉴于该模型用于视频帧捕获......我认为应该可以将每张图像的时间至少缩短到 1 秒以下。

我的问题是:

1) 减少推理时间的解释/解决方案有哪些？

2) 1.5 秒将图像转换为 numpy(在处理之前)是离线的吗？

3) 如果这是我期望的最佳性能，我希望通过将模型重新加工为批处理图像来增加多少时间？

感谢您的帮助!

来自 python notebook 的代码:

import numpy as np
import os
import six.moves.urllib as urllib
import sys
import tarfile
import tensorflow as tf
import zipfile
import json
import collections
import os.path
import datetime

from collections import defaultdict
from io import StringIO
from matplotlib import pyplot as plt
from PIL import Image

# This is needed since the notebook is stored in the object_detection folder.
sys.path.append("..")

# This is needed to display the images.
get_ipython().magic('matplotlib inline')

#Setup variables
PATH_TO_TEST_IMAGES_DIR = 'test_images'

MODEL_NAME = 'faster_rcnn_resnet101_kitti_2018_01_28'

# Path to frozen detection graph. This is the actual model that is used for the object detection.
PATH_TO_CKPT = MODEL_NAME + '/frozen_inference_graph.pb'

# List of the strings that is used to add correct label for each box.
PATH_TO_LABELS = os.path.join('data', 'kitti_label_map.pbtxt')

NUM_CLASSES = 2

from utils import label_map_util
from utils import visualization_utils as vis_util

def get_scores(
    boxes,
    classes,
    scores,
    category_index,
    min_score_thresh=.5
):

  import collections
  # Create a display string (and color) for every box location, group any boxes
  # that correspond to the same location.
  box_to_display_str_map = collections.defaultdict(list)

  for i in range(boxes.shape[0]):
    if scores is None or scores[i] > min_score_thresh:
      box = tuple(boxes[i].tolist())
      if scores is None:
        box_to_color_map[box] = groundtruth_box_visualization_color
      else:
        display_str = ''
        if classes[i] in category_index.keys():
          class_name = category_index[classes[i]]['name']
        else:
          class_name = 'N/A'
        display_str = str(class_name)
        if not display_str:
          display_str = '{}%'.format(int(100*scores[i]))
        else:
          display_str = '{}: {}%'.format(display_str, int(100*scores[i]))
        box_to_display_str_map[i].append(display_str)

  return box_to_display_str_map

def load_image_into_numpy_array(image):
  (im_width, im_height) = image.size
  return np.array(image.getdata()).reshape(
      (im_height, im_width, 3)).astype(np.uint8)

def run_inference_for_single_image(image, graph):
  with graph.as_default():
    with tf.Session() as sess:
      # Get handles to input and output tensors
      ops = tf.get_default_graph().get_operations()
      all_tensor_names = {output.name for op in ops for output in op.outputs}
      tensor_dict = {}
      for key in [
          'num_detections', 'detection_boxes', 'detection_scores',
          'detection_classes', 'detection_masks'
      ]:
        tensor_name = key + ':0'
        if tensor_name in all_tensor_names:
          tensor_dict[key] = tf.get_default_graph().get_tensor_by_name(
              tensor_name)
      if 'detection_masks' in tensor_dict:
        # The following processing is only for single image
        detection_boxes = tf.squeeze(tensor_dict['detection_boxes'], [0])
        detection_masks = tf.squeeze(tensor_dict['detection_masks'], [0])
        # Reframe is required to translate mask from box coordinates to image coordinates and fit the image size.
        real_num_detection = tf.cast(tensor_dict['num_detections'][0], tf.int32)
        detection_boxes = tf.slice(detection_boxes, [0, 0], [real_num_detection, -1])
        detection_masks = tf.slice(detection_masks, [0, 0, 0], [real_num_detection, -1, -1])
        detection_masks_reframed = utils_ops.reframe_box_masks_to_image_masks(
            detection_masks, detection_boxes, image.shape[0], image.shape[1])
        detection_masks_reframed = tf.cast(
            tf.greater(detection_masks_reframed, 0.5), tf.uint8)
        # Follow the convention by adding back the batch dimension
        tensor_dict['detection_masks'] = tf.expand_dims(
            detection_masks_reframed, 0)
      image_tensor = tf.get_default_graph().get_tensor_by_name('image_tensor:0')

      # Run inference
      output_dict = sess.run(tensor_dict,
                             feed_dict={image_tensor: np.expand_dims(image, 0)})

      # all outputs are float32 numpy arrays, so convert types as appropriate
      output_dict['num_detections'] = int(output_dict['num_detections'][0])
      output_dict['detection_classes'] = output_dict[
          'detection_classes'][0].astype(np.uint8)
      output_dict['detection_boxes'] = output_dict['detection_boxes'][0]
      output_dict['detection_scores'] = output_dict['detection_scores'][0]
      if 'detection_masks' in output_dict:
        output_dict['detection_masks'] = output_dict['detection_masks'][0]
  return output_dict

#get list of paths
exten='.jpg'
TEST_IMAGE_PATHS=[]

for dirpath, dirnames, files in os.walk(PATH_TO_TEST_IMAGES_DIR):
    for name in files:
        if name.lower().endswith(exten):
            #print(os.path.join(dirpath,name))
            TEST_IMAGE_PATHS.append(os.path.join(dirpath,name))
print((len(TEST_IMAGE_PATHS), 'Images To Process'))

#load model graph for inference
detection_graph = tf.Graph()
with detection_graph.as_default():
  od_graph_def = tf.GraphDef()
  with tf.gfile.GFile(PATH_TO_CKPT, 'rb') as fid:
    serialized_graph = fid.read()
    od_graph_def.ParseFromString(serialized_graph)
    tf.import_graph_def(od_graph_def, name='')

#setup class labeling parameters    
label_map = label_map_util.load_labelmap(PATH_TO_LABELS)
categories = label_map_util.convert_label_map_to_categories(label_map, max_num_classes=NUM_CLASSES, use_display_name=True)
category_index = label_map_util.create_category_index(categories)

#placeholder for timings
myTimings=[]

myX = 1
myResults = collections.defaultdict(list)
for image_path in TEST_IMAGE_PATHS:
  if os.path.exists(image_path):  
    print(myX,"--------------------------------------",datetime.datetime.time(datetime.datetime.now()))
    print(myX,"Image:", image_path)
    myTimings.append((myX,"Image", image_path))
    print(myX,"Open:",datetime.datetime.time(datetime.datetime.now()))
    myTimings.append((myX,"Open",datetime.datetime.time(datetime.datetime.now()).__str__()))
    image = Image.open(image_path)
    # the array based representation of the image will be used later in order to prepare the
    # result image with boxes and labels on it.
    print(myX,"Numpy:",datetime.datetime.time(datetime.datetime.now()))
    myTimings.append((myX,"Numpy",datetime.datetime.time(datetime.datetime.now()).__str__()))
    image_np = load_image_into_numpy_array(image)
    # Expand dimensions since the model expects images to have shape: [1, None, None, 3]
    print(myX,"Expand:",datetime.datetime.time(datetime.datetime.now()))
    myTimings.append((myX,"Expand",datetime.datetime.time(datetime.datetime.now()).__str__()))
    image_np_expanded = np.expand_dims(image_np, axis=0)
    # Actual detection.
    print(myX,"Detect:",datetime.datetime.time(datetime.datetime.now()))
    myTimings.append((myX,"Detect",datetime.datetime.time(datetime.datetime.now()).__str__()))
    output_dict = run_inference_for_single_image(image_np, detection_graph)
    # Visualization of the results of a detection.
    print(myX,"Export:",datetime.datetime.time(datetime.datetime.now()))
    myTimings.append((myX,"Export",datetime.datetime.time(datetime.datetime.now()).__str__()))
    op=get_scores(
      output_dict['detection_boxes'],
      output_dict['detection_classes'],
      output_dict['detection_scores'],
      category_index,
      min_score_thresh=.2)
    myResults[image_path].append(op)  
    print(myX,"Done:", datetime.datetime.time(datetime.datetime.now()))
    myTimings.append((myX,"Done", datetime.datetime.time(datetime.datetime.now()).__str__()))
    myX= myX + 1

#save results    
with open((OUTPUTS_BASENAME+'_Results.json'), 'w') as fout:
    json.dump(myResults, fout)
with open((OUTPUTS_BASENAME+'_Timings.json'), 'w') as fout:
    json.dump(myTimings, fout)

时间示例:

[1, "Image", "test_images/DE4T_11Jan2018/MFDC4612.JPG"]
[1, "Open", "19:20:08.029423"]
[1, "Numpy", "19:20:08.052679"]
[1, "Expand", "19:20:09.977166"]
[1, "Detect", "19:20:09.977250"]
[1, "Export", "19:23:13.902443"]
[1, "Done", "19:23:13.903012"]
[2, "Image", "test_images/DE4T_11Jan2018/MFDC4616.JPG"]
[2, "Open", "19:23:13.903885"]
[2, "Numpy", "19:23:13.906320"]
[2, "Expand", "19:23:15.756308"]
[2, "Detect", "19:23:15.756597"]
[2, "Export", "19:23:17.153233"]
[2, "Done", "19:23:17.153699"]
[3, "Image", "test_images/DE4T_11Jan2018/MFDC4681.JPG"]
[3, "Open", "19:23:17.154510"]
[3, "Numpy", "19:23:17.156576"]
[3, "Expand", "19:23:19.012935"]
[3, "Detect", "19:23:19.013013"]
[3, "Export", "19:23:20.323839"]
[3, "Done", "19:23:20.324307"]
[4, "Image", "test_images/DE4T_11Jan2018/MFDC4697.JPG"]
[4, "Open", "19:23:20.324791"]
[4, "Numpy", "19:23:20.327136"]
[4, "Expand", "19:23:22.175578"]
[4, "Detect", "19:23:22.175658"]
[4, "Export", "19:23:23.472040"]
[4, "Done", "19:23:23.472297"]

最佳答案

1) 你可以做的是直接加载视频而不是图像，然后更改“run_inference_for_single_image()”以创建一次 session 并在其中加载图像/视频(重新创建图形非常慢)。此外，您可以编辑管道配置文件以减少提议的数量，这将直接加速推理。请注意，之后您必须重新导出图表 (https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/exporting_models.md)。批处理也有帮助(虽然我很抱歉，我忘记了多少)最后，您可以使用多处理来卸载 CPU 特定操作(绘制边界框、加载数据)以更好地利用 GPU。

2) 将图像离线转换为 numpy(处理之前)需要 1.5 秒 <- 是的，这太慢了，而且还有很大的改进空间。

3)虽然我不知道 AWS 上的确切 gpu(k80？)，但您应该能够在 geforce 1080TI 上获得超过 10fps 的所有修复，这与他们报告的 79ms 时间一致(在哪里对于 faster-rcnn_resnet_101，你得到 20ms？？)

关于amazon-web-services - Tensorflow 对象检测 API 的推理时间较慢，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/49287011/

文章推荐： maven - 无法读取扩展描述符。在中央找不到 Artifact

javascript - 为什么 xpath 较慢
有人可以解释一下，在 DOM 中搜索元素时，为什么 Xpath 被认为比 CSS 选择器慢。不同的选择器是否有不同的引擎(例如 Xpath、CSS 选择器等) 谢谢最佳答案 Xpath 并不是被认为
c# - Ajax 调用在物理上不同的文件中对 Controller 较慢
在我们的一个 MVC 页面中尝试加速某些 ajax 调用时，我遇到了一些我无法真正解释的奇怪行为。我每隔 N 秒就会进行一些 ajax 调用，以轮询一些统计数据。似乎在物理上不同的文件中对 Cont
java - Apache Commons Lang StringUtils 较慢
Background 尝试进行一个简单的实验，看看传统的 if 语句检查 null 是否比 Apache Commons Lang StringUtils isEmpty/isBlank 更快。为了
android - 与 PC 相比，为什么 Android 中的响应时间(对于 Rest Call)较慢？
我正在从 Android 设备调用 rest api，并且看到与 PC 相比的速度差异，我感到非常惊讶。下面是来自 PC 上的休息工具的图像。我尝试了几个库，如 Retrofit、Volley 和常
python - 为什么 scipy.distance.cdist 在使用 float32 (较慢)和 float64 (较快)之间有很大的性能差异？
为什么 scipy.distance.cdist 使用 float32 和 float64 时性能差异很大？ from scipy.spatial import distance import num

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

amazon-web-services - Tensorflow 对象检测 API 的推理时间较慢