gpt4 book ai didi

神经网络的 Python 实时图像分类问题

转载 作者:IT老高 更新时间:2023-10-28 22:05:47 24 4
gpt4 key购买 nike

我正在尝试使用 caffe 和 python 进行实时图像分类。我在一个进程中使用 OpenCV 从我的网络摄像头流式传输,在一个单独的进程中,使用 caffe 对从网络摄像头拉取的帧执行图像分类。然后我将分类结果传回主线程,为网络摄像头流添加字幕。

问题是,即使我有一个 NVIDIA GPU 并且正在 GPU 上执行 caffe 预测,主线程也会变慢。通常不做任何预测,我的网络摄像头流以 30 fps 运行;但是,根据预测,我的网络摄像头流最多可以达到 15 fps。

我已验证 caffe 在执行预测时确实使用了 GPU,并且我的 GPU 或 GPU 内存没有达到最大值。我还验证了我的 CPU 内核在程序期间的任何时候都没有达到最大值。我想知道我是否做错了什么,或者是否没有办法让这两个过程真正分开。任何建议表示赞赏。这是我的引用代码

class Consumer(multiprocessing.Process):

def __init__(self, task_queue, result_queue):
multiprocessing.Process.__init__(self)
self.task_queue = task_queue
self.result_queue = result_queue
#other initialization stuff

def run(self):
caffe.set_mode_gpu()
caffe.set_device(0)
#Load caffe net -- code omitted
while True:
image = self.task_queue.get()
#crop image -- code omitted
text = net.predict(image)
self.result_queue.put(text)

return

import cv2
import caffe
import multiprocessing
import Queue

tasks = multiprocessing.Queue()
results = multiprocessing.Queue()
consumer = Consumer(tasks,results)
consumer.start()

#Creating window and starting video capturer from camera
cv2.namedWindow("preview")
vc = cv2.VideoCapture(0)
#Try to get the first frame
if vc.isOpened():
rval, frame = vc.read()
else:
rval = False
frame_copy[:] = frame
task_empty = True
while rval:
if task_empty:
tasks.put(frame_copy)
task_empty = False
if not results.empty():
text = results.get()
#Add text to frame
cv2.putText(frame,text)
task_empty = True

#Showing the frame with all the applied modifications
cv2.imshow("preview", frame)

#Getting next frame from camera
rval, frame = vc.read()
frame_copy[:] = frame
#Getting keyboard input
key = cv2.waitKey(1)
#exit on ESC
if key == 27:
break

我很确定是 caffe 预测减慢了一切,因为当我注释掉预测并在进程之间来回传递虚拟文本时,我再次获得 30 fps。

class Consumer(multiprocessing.Process):

def __init__(self, task_queue, result_queue):
multiprocessing.Process.__init__(self)
self.task_queue = task_queue
self.result_queue = result_queue
#other initialization stuff

def run(self):
caffe.set_mode_gpu()
caffe.set_device(0)
#Load caffe net -- code omitted
while True:
image = self.task_queue.get()
#crop image -- code omitted
#text = net.predict(image)
text = "dummy text"
self.result_queue.put(text)

return

import cv2
import caffe
import multiprocessing
import Queue

tasks = multiprocessing.Queue()
results = multiprocessing.Queue()
consumer = Consumer(tasks,results)
consumer.start()

#Creating window and starting video capturer from camera
cv2.namedWindow("preview")
vc = cv2.VideoCapture(0)
#Try to get the first frame
if vc.isOpened():
rval, frame = vc.read()
else:
rval = False
frame_copy[:] = frame
task_empty = True
while rval:
if task_empty:
tasks.put(frame_copy)
task_empty = False
if not results.empty():
text = results.get()
#Add text to frame
cv2.putText(frame,text)
task_empty = True

#Showing the frame with all the applied modifications
cv2.imshow("preview", frame)

#Getting next frame from camera
rval, frame = vc.read()
frame_copy[:] = frame
#Getting keyboard input
key = cv2.waitKey(1)
#exit on ESC
if key == 27:
break

最佳答案

一些解释和一些反射(reflection):

  1. 我在具有 Intel Core i5-6300HQ @2.3GHz cpu、8 GB RAMNVIDIA GeForce GTX 的笔记本电脑上运行了下面的代码960M gpu(2GB memory),结果是:

    我是否在运行 caffe 的情况下运行代码(通过注释掉 net_output = this->net_->Forward(net_input) 以及 void Consumer::中的一些必要内容entry()),我总能在主线程中获得大约 30 fps。

    在具有 Intel Core i5-4440 cpu、8 GB RAMNVIDIA GeForce GT 630 gpu 的 PC 上得到了类似的结果(1GB 内存)。

  2. 我运行了@user3543300的代码在同一台笔记本电脑上的问题中,结果是:

    无论 caffe 是否在运行(在 gpu 上),我也可以达到 30 fps 左右。

  3. 根据@user3543300的反馈,使用上面提到的 2 个版本的代码,@user3543300 在运行 caffe 时只能获得大约 15 fps(在 Nvidia GeForce 940MX GPU 和 Intel® Core™ i7-6500U CPU @ 2.50GHz × 4笔记本电脑)。并且当 caffe 作为独立程序在 gpu 上运行时,网络摄像头的帧率也会有所下降。

所以我仍然认为问题很可能出在硬件 I/O 限制上,例如 DMA 带宽(这个关于 DMA 的帖子可能会暗示。)或 RAM 带宽。希望 @user3543300可以检查一下或者找出我没有意识到的真正问题。

如果问题确实是我上面的想法,那么一个明智的想法是减少 CNN 网络引入的内存 I/O 开销。事实上,为了解决硬件资源有限的嵌入式系统上的类似问题,已经有一些关于这个主题的研究,例如Qautization Structurally Sparse Deep Neural Networks , SqueezeNet , Deep-Compression .因此,希望通过应用这些技巧也有助于提高问题中网络摄像头的帧率。


原答案:

试试这个 c++ 解决方案。它为 I/O overhead 使用线程。在您的任务中,我使用 bvlc_alexnet.caffemodel 对其进行了测试, deploy.prototxt进行图像分类并且在 caffe 运行时(在 GPU 上)没有看到主线程(网络摄像头流)明显变慢:

#include <stdio.h>
#include <iostream>
#include <string>
#include <boost/thread.hpp>
#include <boost/shared_ptr.hpp>
#include "caffe/caffe.hpp"
#include "caffe/util/blocking_queue.hpp"
#include "caffe/data_transformer.hpp"
#include "opencv2/opencv.hpp"

using namespace cv;

//Queue pair for sharing image/results between webcam and caffe threads
template<typename T>
class QueuePair {
public:
explicit QueuePair(int size);
~QueuePair();

caffe::BlockingQueue<T*> free_;
caffe::BlockingQueue<T*> full_;

DISABLE_COPY_AND_ASSIGN(QueuePair);
};
template<typename T>
QueuePair<T>::QueuePair(int size) {
// Initialize the free queue
for (int i = 0; i < size; ++i) {
free_.push(new T);
}
}
template<typename T>
QueuePair<T>::~QueuePair(){
T *data;
while (free_.try_pop(&data)){
delete data;
}
while (full_.try_pop(&data)){
delete data;
}
}
template class QueuePair<Mat>;
template class QueuePair<std::string>;

//Do image classification(caffe predict) using a subthread
class Consumer{
public:
Consumer(boost::shared_ptr<QueuePair<Mat>> task
, boost::shared_ptr<QueuePair<std::string>> result);
~Consumer();
void Run();
void Stop();
void entry(boost::shared_ptr<QueuePair<Mat>> task
, boost::shared_ptr<QueuePair<std::string>> result);

private:
bool must_stop();

boost::shared_ptr<QueuePair<Mat> > task_q_;
boost::shared_ptr<QueuePair<std::string> > result_q_;

//caffe::Blob<float> *net_input_blob_;
boost::shared_ptr<caffe::DataTransformer<float> > data_transformer_;
boost::shared_ptr<caffe::Net<float> > net_;
std::vector<std::string> synset_words_;
boost::shared_ptr<boost::thread> thread_;
};
Consumer::Consumer(boost::shared_ptr<QueuePair<Mat>> task
, boost::shared_ptr<QueuePair<std::string>> result) :
task_q_(task), result_q_(result), thread_(){

//for data preprocess
caffe::TransformationParameter trans_para;
//set mean
trans_para.set_mean_file("/path/to/imagenet_mean.binaryproto");
//set crop size, here is cropping 227x227 from 256x256
trans_para.set_crop_size(227);
//instantiate a DataTransformer using trans_para for image preprocess
data_transformer_.reset(new caffe::DataTransformer<float>(trans_para
, caffe::TEST));

//initialize a caffe net
net_.reset(new caffe::Net<float>(std::string("/path/to/deploy.prototxt")
, caffe::TEST));
//net parameter
net_->CopyTrainedLayersFrom(std::string("/path/to/bvlc_alexnet.caffemodel"));

std::fstream synset_word("path/to/caffe/data/ilsvrc12/synset_words.txt");
std::string line;
if (!synset_word.good()){
std::cerr << "synset words open failed!" << std::endl;
}
while (std::getline(synset_word, line)){
synset_words_.push_back(line.substr(line.find_first_of(' '), line.length()));
}
//a container for net input, holds data converted from cv::Mat
//net_input_blob_ = new caffe::Blob<float>(1, 3, 227, 227);
}
Consumer::~Consumer(){
Stop();
//delete net_input_blob_;
}
void Consumer::entry(boost::shared_ptr<QueuePair<Mat>> task
, boost::shared_ptr<QueuePair<std::string>> result){

caffe::Caffe::set_mode(caffe::Caffe::GPU);
caffe::Caffe::SetDevice(0);

cv::Mat *frame;
cv::Mat resized_image(256, 256, CV_8UC3);
cv::Size re_size(resized_image.cols, resized_image.rows);

//for caffe input and output
const std::vector<caffe::Blob<float> *> net_input = this->net_->input_blobs();
std::vector<caffe::Blob<float> *> net_output;

//net_input.push_back(net_input_blob_);
std::string *res;

int pre_num = 1;
while (!must_stop()){
std::stringstream result_strm;
frame = task->full_.pop();
cv::resize(*frame, resized_image, re_size, 0, 0, CV_INTER_LINEAR);
this->data_transformer_->Transform(resized_image, *net_input[0]);
net_output = this->net_->Forward();
task->free_.push(frame);

res = result->free_.pop();
//Process results here
for (int i = 0; i < pre_num; ++i){
result_strm << synset_words_[net_output[0]->cpu_data()[i]] << " "
<< net_output[0]->cpu_data()[i + pre_num] << "\n";
}
*res = result_strm.str();
result->full_.push(res);
}
}

void Consumer::Run(){
if (!thread_){
try{
thread_.reset(new boost::thread(&Consumer::entry, this, task_q_, result_q_));
}
catch (std::exception& e) {
std::cerr << "Thread exception: " << e.what() << std::endl;
}
}
else
std::cout << "Consumer thread may have been running!" << std::endl;
};
void Consumer::Stop(){
if (thread_ && thread_->joinable()){
thread_->interrupt();
try {
thread_->join();
}
catch (boost::thread_interrupted&) {
}
catch (std::exception& e) {
std::cerr << "Thread exception: " << e.what() << std::endl;
}
}
}
bool Consumer::must_stop(){
return thread_ && thread_->interruption_requested();
}


int main(void)
{
int max_queue_size = 1000;
boost::shared_ptr<QueuePair<Mat>> tasks(new QueuePair<Mat>(max_queue_size));
boost::shared_ptr<QueuePair<std::string>> results(new QueuePair<std::string>(max_queue_size));

char str[100], info_str[100] = " results: ";
VideoCapture vc(0);
if (!vc.isOpened())
return -1;

Consumer consumer(tasks, results);
consumer.Run();

Mat frame, *frame_copy;
namedWindow("preview");
double t, fps;

while (true){
t = (double)getTickCount();
vc.read(frame);

if (waitKey(1) >= 0){
consuer.Stop();
break;
}

if (tasks->free_.try_peek(&frame_copy)){
frame_copy = tasks->free_.pop();
*frame_copy = frame.clone();
tasks->full_.push(frame_copy);
}
std::string *res;
std::string frame_info("");
if (results->full_.try_peek(&res)){
res = results->full_.pop();
frame_info = frame_info + info_str;
frame_info = frame_info + *res;
results->free_.push(res);
}

t = ((double)getTickCount() - t) / getTickFrequency();
fps = 1.0 / t;

sprintf(str, " fps: %.2f", fps);
frame_info = frame_info + str;

putText(frame, frame_info, Point(5, 20)
, FONT_HERSHEY_SIMPLEX, 0.5, Scalar(0, 255, 0));
imshow("preview", frame);
}
}

src/caffe/util/blocking_queue.cpp ,在下面稍作改动并重建 caffe:

...//Other stuff
template class BlockingQueue<Batch<float>*>;
template class BlockingQueue<Batch<double>*>;
template class BlockingQueue<Datum*>;
template class BlockingQueue<shared_ptr<DataReader::QueuePair> >;
template class BlockingQueue<P2PSync<float>*>;
template class BlockingQueue<P2PSync<double>*>;
//add these 2 lines below
template class BlockingQueue<cv::Mat*>;
template class BlockingQueue<std::string*>;

关于神经网络的 Python 实时图像分类问题,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39522693/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com