Why I get RAM leakage while running Yolov5 tensorflow inference server (FASTAPI) with docker?(为什么在运行带有docker的Yolov5 TensorFlow推理服务器(FASTAPI)时出现内存泄漏？)-6ren

Why I get RAM leakage while running Yolov5 tensorflow inference server (FASTAPI) with docker?(为什么在运行带有docker的Yolov5 TensorFlow推理服务器(FASTAPI)时出现内存泄漏？)

转载作者：bug小助手更新时间：2023-10-25 17:57:21

I got the following code for running inference server with FastApi. In about 8 hours my RAM memory increases by 4Gi.

我获得了使用FastApi运行推理服务器的以下代码。在大约8小时内，我的RAM内存增加了4Gi。

What is more interesting - when I stop my container RAM memory does not clean.

更有趣的是，当我停止我的容器时，内存并没有清理干净。

For instance, before running docker container I get 2 Gi RAM used, as soon as I run docker container memory increases up to 4Gi and then starts slowly grow. (8 hours - 8Gi RAM). After stoping container I still get 6 RAM used... It means that I get around 4Gi RAM leakage.

例如，在运行Docker Container之前，我使用了2GI的内存，一运行Docker Container，内存就增加到4Gi，然后开始缓慢增长。(8小时-8G内存)。在回收容器后，我仍有6个内存被使用。这意味着我可以避免4G内存泄漏。

Where could be my problem?

我的问题出在哪里？

import tensorflow as tf
import cv2
import numpy as np
import yaml
import time
from loguru import logger
import tracemalloc

# from flask import Flask, render_template, request, make_response
from PIL import Image
import os
import io
import json
import base64
import gc


# tf.config.gpu.set_per_process_memory_fraction(0.75) 

print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))

# ========= Функция для обработки сырой картинки и подготовки ее для инференса =========
def letterbox(im, new_shape=(640, 640), color=(114, 114, 114), auto=True, scaleFill=False, scaleup=True, stride=32):
    # Resize and pad image while meeting stride-multiple constraints
    shape = im.shape[:2]  # current shape [height, width]
    if isinstance(new_shape, int):
        new_shape = (new_shape, new_shape)

    # Scale ratio (new / old)
    r = min(new_shape[0] / shape[0], new_shape[1] / shape[1])
    if not scaleup:  # only scale down, do not scale up (for better val mAP)
        r = min(r, 1.0)

    # Compute padding
    ratio = r, r  # width, height ratios
    new_unpad = int(round(shape[1] * r)), int(round(shape[0] * r))
    dw, dh = new_shape[1] - new_unpad[0], new_shape[0] - new_unpad[1]  # wh padding
    if auto:  # minimum rectangle
        dw, dh = np.mod(dw, stride), np.mod(dh, stride)  # wh padding
    elif scaleFill:  # stretch
        dw, dh = 0.0, 0.0
        new_unpad = (new_shape[1], new_shape[0])
        ratio = new_shape[1] / shape[1], new_shape[0] / shape[0]  # width, height ratios

    dw /= 2  # divide padding into 2 sides
    dh /= 2

    if shape[::-1] != new_unpad:  # resize
        im = cv2.resize(im, new_unpad, interpolation=cv2.INTER_LINEAR)
    top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1))
    left, right = int(round(dw - 0.1)), int(round(dw + 0.1))
    im = cv2.copyMakeBorder(im, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color)  # add border
    return im, ratio, (dw, dh)

# ========= Функция для загрузки весов модели =========
def load_graph(frozen_graph_filename):
 
    with tf.io.gfile.GFile(frozen_graph_filename, "rb") as f:
        graph_def = tf.compat.v1.GraphDef()
        graph_def.ParseFromString(f.read())
    with tf.Graph().as_default() as graph:
        tf.import_graph_def(graph_def)
    return graph

# ========= Класс конфига =========
class create_config():
    def __init__(self, file):
        with open(file, 'r') as f:
            data = yaml.safe_load(f)
        self.modelPath = data['modelPath']
        self.logFile = data['logFile']
        self.loggingLevel = data['loggingLevel']
        self.host = data['host']
        self.port = data['port']

# ========= Функция обработки результатов YOLO =========
def YOLOdetect(output_data):  # input = interpreter, output is boxes(xyxy), classes, scores
    output_data = output_data[0]                # x(1, 25200, 7) to x(25200, 7)
    boxes = np.squeeze(output_data[..., :4])    # boxes  [25200, 4]
    scores = np.squeeze( output_data[..., 4:5]) # confidences  [25200, 1]
    classes = classFilter(output_data[..., 5:]) # get classes
#     Convert nx4 boxes from [x, y, w, h] to [x1, y1, x2, y2] where xy1=top-left, xy2=bottom-right
    x, y, w, h = boxes[..., 0], boxes[..., 1], boxes[..., 2], boxes[..., 3] #xywh
    xyxy = [x - w / 2, y - h / 2, x + w / 2, y + h / 2]  # xywh to xyxy   [4, 25200]

    return xyxy, scores, classes  # output is boxes(x,y,x,y), classes(int), scores(float) [predictions length] 

# ========= Функция отбора чисто классов =========
def classFilter(classdata):
    classes = []  # create a list
    for i in range(classdata.shape[0]):         # loop through all predictions
        classes.append(classdata[i].argmax())   # get the best classification location
    return classes  # return classes (int)      

# ========= Функция масштабирования координат на изначальное разрешение картинки =========
def scale_coords(img1_shape, coords, img0_shape, ratio_pad=None):
    # Rescale coords (xyxy) from img1_shape to img0_shape
    if ratio_pad is None:  # calculate from img0_shape
        gain = min(img1_shape[0] / img0_shape[0], img1_shape[1] / img0_shape[1])  # gain  = old / new
        pad = (img1_shape[1] - img0_shape[1] * gain) / 2, (img1_shape[0] - img0_shape[0] * gain) / 2  # wh padding
    else:
        gain = ratio_pad[0][0]
        pad = ratio_pad[1]

    coords[:, [0, 2]] -= pad[0]  # x padding
    coords[:, [1, 3]] -= pad[1]  # y padding
    coords[:, :4] /= gain
    clip_coords(coords, img0_shape)
    return coords 

# ========= Функция ограничивающая координаты =========
def clip_coords(boxes, shape):
    # Clip bounding xyxy bounding boxes to image shape (height, width)
    
    boxes[:, [0, 2]] = boxes[:, [0, 2]].clip(0, shape[1])  # x1, x2
    boxes[:, [1, 3]] = boxes[:, [1, 3]].clip(0, shape[0])  # y1, y

# ========= Класс модели =========
class Yolov5:
    @logger.catch
    def __init__(self, model_weight):
        self.model_weight = model_weight
        self._start()

    @logger.catch        
    def _start(self):
        logger.error(' ========= PIZDETS ========= ')
        self.graph = load_graph(self.model_weight)
        self.x = self.graph.get_tensor_by_name('import/x:0')
        self.y = self.graph.get_tensor_by_name('import/Identity:0')
        # self.sess = tf.compat.v1.Session('', self.graph) 
        self.sess = tf.compat.v1.Session(graph=self.graph, config=tf.compat.v1.ConfigProto(log_device_placement=True))

    @logger.catch
    def forward(self, image):    
        y_out = self.sess.run(self.y, feed_dict={self.x: image})
        return y_out

    @logger.catch
    def process_image(self, im0):        
        im = letterbox(im0, (640,640), stride=32, auto=False)[0]  # padded resize
        h, w = im.shape[:2]
        # im = im[..., ::-1]  # HWC to CHW, BGR to RGB
        im = np.ascontiguousarray(im)  # contiguous
        im = im.astype('float32')
        im /= 255  # 0 - 255 to 0.0 - 1.0
        if len(im.shape) == 3:
            im = im[None]  # expand for batch dim
            
        return im, cv2.cvtColor(im0, cv2.COLOR_BGR2RGB), h, w, im0.shape
        
    @logger.catch
    def postprocess(self, y, h, w, old_shape):
        y = [x if isinstance(x, np.ndarray) else x.numpy() for x in y]
        y[0][..., :4] *= [w, h, w, h]  # xywh normalized to pixels
        xyxy, scores, classes = YOLOdetect(y)
        xyxy = np.array(xyxy).T
        
        indexes = tf.image.non_max_suppression(xyxy, scores, max_output_size=10, iou_threshold=0.5, score_threshold=0.3)
        filtered_xyxy = xyxy[indexes, :]
        filtered_scores = scores[np.array(indexes)]
        
        scaled_xyxy = scale_coords((h,w), filtered_xyxy, old_shape).round()
        
        return scaled_xyxy, filtered_scores

    @logger.catch
    def run_local(self, image):
        # PREPROCESS
        im, im0, h, w, old_shape = self.process_image(image)
        # INFERENCE
        y = self.forward(im)
        # POSTPROCESS
        coords, scores = self.postprocess(y, h, w, old_shape)
        return coords, scores

config = create_config('extra/config.yml')
# print('======= Config created =======')
# model = Yolov5(config.modelPath)
# print('======= Model loaded =======')
# print('======= GPU MEMORY =======',  tf.config.experimental.get_memory_usage('GPU:0'))

# image_path = 'extra/test_image.jpg'

# # Загружаем изображение с помощью CV2
# image = cv2.imread(image_path)

# TOTAL_IMAGES = 10000
# start = time.time()
# # INFERENCE
# for _ in range(TOTAL_IMAGES):
#     model.run_local(image)
# end = time.time()

# # print('======= GPU MEMORY =======',  tf.config.experimental.get_memory_usage('GPU:0'))
# print('======= TOTAL TIME =======',  end - start)
# print('======= TIME PER IMAGE =======',  (end - start) / TOTAL_IMAGES)

#===============================================================================================================
# Logger setup
logger.add(config.logFile, format="{time} {level} {message}", level=config.loggingLevel)


import io
import uvicorn
from fastapi import FastAPI, File, UploadFile
from fastapi.responses import StreamingResponse
from PIL import Image
from pathlib import Path
from pydantic import BaseModel

app = FastAPI()

class PredictRequest(BaseModel):
    img: str
    targetPercent: float

def get_prediction(image):
    with tf.device('/GPU:0'):
        results = model.run_local(image)
        tf.keras.backend.clear_session()
        return results

@app.post("/predict/")
async def predict_image(request: PredictRequest):
    base64_image = request.img
    conf_threshold = request.targetPercent
    img_data = base64.b64decode(base64_image)
    image_np = np.frombuffer(img_data, np.uint8)
    image = cv2.imdecode(image_np, cv2.IMREAD_COLOR)

    results = get_prediction(image)
    
    res = []

    
    for coord, score in zip(results[0], results[1]):
        x1, y1, x2, y2 = list(map(int, coord))
        confidence = float(score)

        if confidence >= conf_threshold:
            res.append({'pointMin' : (x1, y1), 'pointMax' : (x2, y2), 'label' : "people", 'percent' : confidence})

    del results, img_data, image_np, image
    gc.collect()

    return json.dumps({'params' : res})

if __name__ == '__main__':

    logger.info('starting yolov5 webservice... (TF)')
    logger.info(f"cuda is available: {len(tf.config.list_physical_devices('GPU')) != 0}")

    try:
        model = Yolov5(config.modelPath)
    except Exception as e:
        logger.error(f"couldn't load model weights with error {e}")

    uvicorn.run(app, host=config.host, port=config.port)

更多回答

优秀答案推荐

更多回答

tensorflow - 如何将 tensorflow.js 模型和权重转换为标准 tensorflow？
我想将模型及其各自训练的权重从 tensorflow.js 转换为标准 tensorflow，但无法弄清楚如何做到这一点，tensorflow.js 的文档对此没有任何说明我有一个 manifest
tensorflow - 在没有安装 Tensorflow 的情况下运行 Tensorflow 模型
我有一个运行良好的 TF 模型，它是用 Python 和 TFlearn 构建的。有没有办法在另一个系统上运行这个模型而不安装 Tensorflow？它已经经过预训练，所以我只需要通过它运行数据。我
tensorflow - 如何命名要在 Tensorflow Serving 中使用的 Tensorflow 模型？
当执行 tensorflow_model_server 二进制文件时，它需要一个模型名称命令行参数，model_name。如何在训练期间指定模型名称，以便在运行 tensorflow_model_s
tensorflow - TensorFlow 中的生存分析
我一直在 R 中使用标准包进行生存分析。我知道如何在 TensorFlow 中处理分类问题，例如逻辑回归，但我很难将其映射到生存分析问题。在某种程度上，您有两个输出向量而不是一个输出向量(time_t
tensorflow - Tensorflow 中的高斯核
Torch7 has a library for generating Gaussian Kernels在一个固定的支持。 Tensorflow 中有什么可比的吗？我看到 these distribu
tensorflow - Tensorflow 中的回调
在Keras中我们可以简单的添加回调，如下所示: self.model.fit(X_train,y_train,callbacks=[Custom_callback]) 回调在doc中定义，但我找不到
tensorflow - tensorflow 中的条件打印节点
我正在寻找一种在 tensorflow 中有条件打印节点的方法，使用下面的示例代码行，其中每 10 个循环计数，它应该在控制台中打印一些东西。但这对我不起作用。谁能建议？谢谢，哈米德雷萨， epsi
tensorflow - tensorflow 对象检测训练中的标签文件
我想使用 tensorflow object detection API 创建我自己的 .tfrecord 文件，并将它们用于训练。该记录将是原始数据集的子集，因此模型将仅检测特定类别。我不明白也无法
tensorflow - 为 Tensorflow.js 保存 TensorFlow 模型
我在 TensorFlow 中训练了一个聊天机器人，想保存模型以便使用 TensorFlow.js 将其部署到 Web。我有以下内容 checkpoint = "./chatbot_weights.c
tensorflow - TensorFlow 中图像张量的形状是什么
我最近开始学习 Tensorflow，特别是我想使用卷积神经网络进行图像分类。我一直在看官方仓库中的android demo，特别是这个例子:https://github.com/tensorflow
tensorflow - 为什么 TensorFlow Lite 比桌面版 TensorFlow 慢？
我目前正在研究单图像超分辨率，并且我设法卡住了现有的检查点文件并将其转换为 tensorflow lite。但是，使用 .tflite 文件执行推理时，对一张图像进行上采样所需的时间至少是使用 .ck
tensorflow - tensorflow 中的批量标准化
我注意到 tensorflow 的 api 中已经有批量标准化函数。我不明白的一件事是如何更改训练和测试之间的程序？批量归一化在测试和训练期间的作用不同。具体来说，在训练期间使用固定的均值和方差。
tensorflow - 我转换后的 tensorflow 迁移学习模型总是在 Tensorflow JS 中返回相同的结果
我创建了一个模型，该模型将 Mobilenet V2 应用于 Google colab 中的卷积基础层。然后我使用这个命令转换它: path_to_h5 = working_dir + '/Tenso
tensorflow - TensorFlow 如何知道要更改哪些变量以进行优化？
代码取自:- http://adventuresinmachinelearning.com/python-tensorflow-tutorial/ import tensorflow as tf fr
tensorflow - TensorFlow:我的登录信息是否采用正确的格式以实现交叉熵功能？
好了，所以我准备在Tensorflow中运行 tf.nn.softmax_cross_entropy_with_logits() 函数。据我了解，“logit”应该是概率的张量，每个对应于某个像素的
tensorflow - bazel 使用本地下载的 tensorflow 构建 tensorflow 服务
tensorflow 服务构建依赖于大型 tensorflow ；但我已经成功构建了 tensorflow。所以我想用它。我做这些事情:我更改了 tensorflow 服务 WORKSPACE(org
tensorflow - Tensorflow 嵌入层内部的网络结构是什么？
Tensoflow 嵌入层 ( https://www.tensorflow.org/api_docs/python/tf/keras/layers/Embedding ) 易于使用，并且有大量的文
tensorflow - Tensorflow 是否可以进行增量学习？
我正在尝试使用非常大的数据集(比我的内存大得多)训练 Tensorflow 模型。为了充分利用所有可用的训练数据，我正在考虑将它们分成几个小的“分片”，并一次在一个分片上进行训练。经过一番研究，我
tensorflow - TensorFlow 中的资格跟踪
根据 Sutton 的书 - Reinforcement Learning: An Introduction，网络权重的更新方程为: 其中 et 是资格轨迹。这类似于带有额外 et 的梯度下降更新。
tensorflow - TensorFlow 中的条件执行
如何根据条件选择执行图表的一部分？我的网络有一部分只有在 feed_dict 中提供占位符值时才会执行.如果未提供该值，则采用备用路径。我该如何使用 tensorflow 来实现它？以下是我的代码

bug小助手

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

Why I get RAM leakage while running Yolov5 tensorflow inference server (FASTAPI) with docker?(为什么在运行带有docker的Yolov5 TensorFlow推理服务器(FASTAPI)时出现内存泄漏？)