Ask Your Question

Making Object Detection Faster

asked 2017-12-18 03:26:55 -0600

kimchiboy03 gravatar image

Hello, I am currently trying out the deep neural network in OpenCV 3.3.0

I am currently trying out object detection with dnn.

However, my code seems to run 1 frame per 10 seconds!! (Literally).

Can someone please tell me if it's just my slow computer or if it is that my code is not well written? Thanks in advance.

Here is my code (By the way, my computer has 4GB of RAM):

#include <opencv2/dnn.hpp>
#include <opencv2/dnn/shape_utils.hpp>
#include <opencv2/imgproc.hpp>
#include <opencv2/highgui.hpp>
#include <fstream>
#include <iostream>
#include <algorithm>

using namespace std;
using namespace cv;
using namespace cv::dnn;

const char* classNames[] = { "background",
"aeroplane", "bicycle", "bird", "boat",
"bottle", "bus", "car", "cat", "chair",
"cow", "diningtable", "dog", "horse",
"motorbike", "person", "pottedplant",
"sheep", "sofa", "train", "tvmonitor" };

int main()
    dnn::Net net = readNetFromCaffe("deploy.prototxt", "VGG_VOC0712_SSD_300x300_ft_iter_120000.caffemodel");

    VideoCapture cap(0);

    while (true)
        Mat frame;
        cap >> frame;

        if (frame.empty())

        if (frame.channels() == 4)
            cvtColor(frame, frame, COLOR_BGRA2BGR);

        Mat inputBlob = blobFromImage(frame, 1.0f, Size(300, 300), Scalar(104, 117, 123), false);                                                                            //! [Set input blob]
        net.setInput(inputBlob, "data");
        Mat detection = net.forward("detection_out");

        Mat detectionMat(detection.size[2], detection.size[3], CV_32F, detection.ptr<float>());

        float confidenceThreshold = 0.5;
        for (int i = 0; i < detectionMat.rows; i++)
            float confidence =<float>(i, 2);

            if (confidence > confidenceThreshold)
                size_t objectClass = (size_t)(<float>(i, 1));

                int xLeftBottom = static_cast<int>(<float>(i, 3) * frame.cols);
                int yLeftBottom = static_cast<int>(<float>(i, 4) * frame.rows);
                int xRightTop = static_cast<int>(<float>(i, 5) * frame.cols);
                int yRightTop = static_cast<int>(<float>(i, 6) * frame.rows);

                ostringstream ss;
                ss << confidence;
                String conf(ss.str());

                Rect object(xLeftBottom, yLeftBottom,
                    xRightTop - xLeftBottom,
                    yRightTop - yLeftBottom);

                rectangle(frame, object, Scalar(0, 255, 0));
                String label = String(classNames[objectClass]) + ": " + conf;
                int baseLine = 0;
                Size labelSize = getTextSize(label, FONT_HERSHEY_SIMPLEX, 0.5, 1, &baseLine);
                rectangle(frame, Rect(Point(xLeftBottom, yLeftBottom - labelSize.height),
                    Size(labelSize.width, labelSize.height + baseLine)),
                    Scalar(255, 255, 255), CV_FILLED);
                putText(frame, label, Point(xLeftBottom, yLeftBottom),
                    FONT_HERSHEY_SIMPLEX, 0.5, Scalar(0, 0, 0));

        imshow("detections", frame);
        if (waitKey(1) >= 0) break;

    return 0;
edit retag flag offensive close merge delete


it's not your code, but that is a huge network, using large images.

i guess, you'll have to wait, until they figure, how to get proper ocl/gpu optimization for that

berak gravatar imageberak ( 2017-12-18 03:43:46 -0600 )edit

You've tried one of the largest object detection models. Even on modern CPUs it achieves no more than 5FPS. You may try to test another model like MobileNet-SSD (checkout Additionally you may consider that object detection models usually can work with any size input image (of course with different accuracy). Experiment with input image size.

dkurt gravatar imagedkurt ( 2017-12-18 05:51:20 -0600 )edit

@dkurt So does the type of prototxt and caffe model I use affect the speed and accuracy? And I'll try out the MobileNet-SSD!

kimchiboy03 gravatar imagekimchiboy03 ( 2017-12-18 13:35:13 -0600 )edit

@dkurt Yes, the MobileNet-SSD seems to be faster with a fps of 1.8 To make it faster, would installing opencv 3.3.1 and running yolo v2 help??

kimchiboy03 gravatar imagekimchiboy03 ( 2017-12-18 13:50:28 -0600 )edit

@kimchiboy03 I don't think so. Try to change input's size firstly. In example, for a frame of size 640x480, try to downscale it to 320x240 or less keeping the same aspect ratio.

dkurt gravatar imagedkurt ( 2017-12-18 14:23:19 -0600 )edit

Please, where can I download both files : "deploy.prototxt" and "VGG_VOC0712_SSD_300x300_ft_iter_120000.caffemodel"

Sebyazid gravatar imageSebyazid ( 2017-12-19 05:11:14 -0600 )edit

@Sebyazid , please do not post answers, if you have a question or comment, thank you.

berak gravatar imageberak ( 2017-12-19 05:42:55 -0600 )edit

1 answer

Sort by ยป oldest newest most voted

answered 2017-12-20 14:37:16 -0600

kimchiboy03 gravatar image

updated 2017-12-20 14:39:55 -0600

Thanks to @dkurt my fps went up to 5fps. Which is reasonable for my application.

Firstly, I used MobileNet-SSD which made the code faster with a fps of 1.8 You can download those files from here: prototext, caffemodel (Here you go @Sebyazid )

Then, I made the input size of the image smaller from 300 x 300 to 100 x 100. Although the accuracy goes down by a bit, the program is still capable of recognising objects.

Since I had to use the MobileNet-SSD, I used this Github sample

Thank you @dkurt

edit flag offensive delete link more


have you tried using different backends like Halide / OpenCL? I've heard that they can accelerate the detection speed

tejasa97 gravatar imagetejasa97 ( 2018-08-24 23:45:21 -0600 )edit

Question Tools

1 follower


Asked: 2017-12-18 03:26:55 -0600

Seen: 3,958 times

Last updated: Dec 20 '17