Ask Your Question

problem with facedetection model from dnn module

asked 2018-07-04 08:32:52 -0500

holger gravatar image

updated 2018-07-05 01:20:33 -0500

berak gravatar image

Hello, i am trying to get the face detector caffe model working for a server app. I am using open cv 3.4.1

I tried to use this and this as a template - but somehow the result from the net is always empty (cols and rows are both -1). I made sure the input image exists and contains a face. I am not sure what i am doing wrong, i even made sure the input dimensions for the model (300, 300) are correct.

#include "opencv2/imgproc/imgproc.hpp"
#include "opencv2/highgui/highgui.hpp"
#include "opencv2/dnn.hpp"
#include <math.h>
#include <iostream>
using namespace cv;
using namespace std;
using namespace dnn;

int main( int argc, char** argv)
    Net reconNet = readNetFromCaffe("face_detector.prototxt", "face_detector.caffemodel");     
    Mat img = imread("./arni/arni.jpg");
    //resize to target dimensions
    resize(img, img, Size(300, 300));
    imwrite("arni_resized.jpg", img);

    Mat blob = blobFromImage(img, 1.0, Size(300, 300), Scalar(104.0, 177.0, 123.0), false, false);
    Mat result = reconNet.forward().clone();    
    cout << "result from net " << result << endl;        
    return 0;

Any hint is highly welcome.

edit retag flag offensive close merge delete


@dkurt Can you take a quick look?

holger gravatar imageholger ( 2018-07-04 08:46:29 -0500 )edit

@holger, mentioned tutorial has a face detection demo and if it works you can use it as a baseline. (see a full code sample)

dkurt gravatar imagedkurt ( 2018-07-05 01:47:57 -0500 )edit

I see - so i don't need to use "view page source code" and can look directly! Thank you. I also found some c++ example in the samples. Sorry a bit for being too lazy.

holger gravatar imageholger ( 2018-07-05 01:52:13 -0500 )edit

2 answers

Sort by ยป oldest newest most voted

answered 2018-07-05 00:57:46 -0500

berak gravatar image

updated 2018-07-05 01:00:40 -0500

here's some working code. indeed, you have to parse the prediction output in the same way, as it is with other ssd object detection models:

  • you can also use a "minified" uint8 tf model (smaller load size)
  • the network was trained on 300x300 input, however 128x96 seems to be the best compromise between speed & accuracy
  • Scalar(104, 177, 123, 0) is the mean value of the train dataset. (we usually subtract that from the image, to get "mean-free" input, it's a common normalization technique)

#include "opencv2/opencv.hpp"
#include "opencv2/dnn.hpp"

using namespace cv;
using namespace std;

void drawPred(int classId, float conf, int left, int top, int right, int bottom, Mat& frame)
    rectangle(frame, Point(left, top), Point(right, bottom), Scalar(0, 255, 0));

    std::string label = format("%.2f %d", conf, classId);

    int baseLine;
    Size labelSize = getTextSize(label, FONT_HERSHEY_SIMPLEX, 0.5, 1, &baseLine);

    top = max(top, labelSize.height);
    rectangle(frame, Point(left, top - labelSize.height),
              Point(left + labelSize.width, top + baseLine), Scalar::all(255), FILLED);
    putText(frame, label, Point(left, top), FONT_HERSHEY_SIMPLEX, 0.5, Scalar());

int main(int argc, char** argv)
    float confThreshold = 0.25;

    //dnn::Net net = dnn::readNetFromCaffe("c:/data/mdl/face_detector.prototxt", "c:/data/mdl/res10_300x300_ssd_iter_140000_fp16.caffemodel");
    dnn::Net net = dnn::readNetFromTensorflow("c:/data/mdl/opencv_face_detector_uint8.pb","c:/data/mdl/opencv_face_detector.pbtxt");

    VideoCapture cap(0);
    static bool once=false;
    while(1) {
        Mat f;;

        Mat blob = dnn::blobFromImage(f, 1, Size(128,96), Scalar(104, 177, 123, 0), false, false);
        Mat res = net.forward("detection_out");

        // just for debugging; --  print out the network layers (once)
        if (!once) {
            dnn::MatShape ms1 { blob.size[0], blob.size[1] , blob.size[2], blob.size[3] };
            vector<String> lnames = net.getLayerNames();
            for (size_t i=1; i<lnames.size(); i++) { // skip __NetInputLayer__
                Ptr<dnn::Layer> lyr = net.getLayer((unsigned)i);
                vector<dnn::MatShape> in,out;
                cerr << format("%-38s %-13s", lyr->name.c_str(), lyr->type.c_str());
                for (auto j:in) cerr << "i" << Mat(j).t() << "\t";
                for (auto j:out) cerr << "o" << Mat(j).t() << "\t";
                cerr << endl;
            once = true;

        Mat faces(res.size[2],res.size[3], CV_32F, res.ptr<float>());
        //cout << res.size << " " << faces.size() << endl;
        //cout << faces << endl;
        for (int i=0; i<faces.rows; i++)
            float *data = faces.ptr<float>(i);
            float confidence = data[2];
            if (confidence > confThreshold)
                int left = (int)(data[3] * f.cols);
                int top = (int)(data[4] * f.rows);
                int right = (int)(data[5] * f.cols);
                int bottom = (int)(data[6] * f.rows);
                int classId = (int)(data[1]) - 1;  // Skip 0th background class id.
                drawPred(classId, confidence, left, top, right, bottom, f);
                cout << classId<< " " << confidence<< " " << left<< " " << top<< " " << right<< " " << bottom<< endl;
        int k = waitKey(19);
        if (k>0) break;

    return 0;
edit flag offensive delete link more


Thank you again for your explanation. I now understand now why the scalar is different for different model. Its depends how it expects its data.

I am bit curious.May i ask why you used a tensorflow model over the caffe model? Is that one more accurrate?

    //dnn::Net net = dnn::readNetFromCaffe("c:/data/mdl/face_detector.prototxt", "c:/data/mdl/res10_300x300_ssd_iter_140000_fp16.caffemodel");
dnn::Net net = dnn::readNetFromTensorflow("c:/data/mdl/opencv_face_detector_uint8.pb","c:/data/mdl/opencv_face_detector.pbtxt");
holger gravatar imageholger ( 2018-07-05 01:59:43 -0500 )edit

no it was just to try that out. (it's faster to load, but slightly less accurate)

berak gravatar imageberak ( 2018-07-05 02:08:17 -0500 )edit

oh, and the mean value -- that's a bit unclear, which one belongs to which model ! wiki information also differs from code, only one can be right ! we'll have to ask @dkurt about this again !

berak gravatar imageberak ( 2018-07-05 02:10:54 -0500 )edit

I will figure it out and try the c++ example now. I think it will work :-) Thank you very much again.

holger gravatar imageholger ( 2018-07-05 02:24:32 -0500 )edit

Its working and i got detection time around 42 ms on a relative weak machine. I trained an own face detector with yolo and it has around 30 ms. I am impressed how small, efficient and accurate that caffe model is!

holger gravatar imageholger ( 2018-07-05 03:40:13 -0500 )edit

@berak one last stupid question instead of

void drawPred(Mat& frame)

i could also write?

  void drawPred(Mat frame)

Sorry for this really newbie question.

holger gravatar imageholger ( 2018-07-05 04:04:08 -0500 )edit

no it's not that stupid.

since you try to draw into that Mat, you can't have a const Mat &frame,

but both Mat &frame and Mat frame will work here. the 1st version just saves you from copying the Mat header (~60 bytes)

berak gravatar imageberak ( 2018-07-05 04:18:35 -0500 )edit

Got it.Thank you +

holger gravatar imageholger ( 2018-07-05 04:36:53 -0500 )edit

answered 2018-07-04 12:16:03 -0500

ImageCounter gravatar image

updated 2018-07-05 00:41:42 -0500

Have you tried changing RGBA to RGB before forwarding to the network.

Edited after reading the comments:

Can you try to isolate whether the problem is with your code or with the model by trying the code with


use a scale factor of 1/255 and a mean of 127.5 i.e Scalar(127.5, 127.5, 127.5)

You need to convert from RGBA to RGB

edit flag offensive delete link more


as far as i know imread() returns RGBA - i can give it a try!

holger gravatar imageholger ( 2018-07-04 12:20:54 -0500 )edit

Hmm no sucess - i took a look at the model - here i have the line

input_shape {
  dim: 1
  dim: 3
  dim: 300
  dim: 300

dim 300 seems to be the dimensions(width * height). dim 3 seems to be the rgb values, dim 1 is the alpha value?! This tells nothing if BGRA or RGBA. From testing i can tell that RGBA produces an exception on net.forward for this model. Maybe i am missing something very stupid and i need to reshape the result mat or something. Its just that -1 for rows and cols of the result matrix smells very fishy to me.

holger gravatar imageholger ( 2018-07-04 15:09:11 -0500 )edit

I also just copied Scalar(104.0, 177.0, 123.0) - i dont really understand what this does internally. I can read parameter and api doc - but i am still lost.

holger gravatar imageholger ( 2018-07-04 15:10:59 -0500 )edit

Will try just to run the native c++ sample for this - samples\dnn\resnet_ssd_face.cpp Even when the code looks similar - i will try to run (i hope and pray it will work and i can "copy + paste"

holger gravatar imageholger ( 2018-07-04 15:24:04 -0500 )edit

Well heres an explanation what a blob is: So having cols / rows with -1 for a (Matrix?) Blob seems to be ok? I have no access to my testing machine and too lazy to start my cloud machine - i will test tomorrow - good night!

holger gravatar imageholger ( 2018-07-04 17:18:01 -0500 )edit

@holger, yes, -1 is ok for rows & cols, for a 4d tensor. you have to look at the output.size member for the shape (an array with 4 elements)

berak gravatar imageberak ( 2018-07-05 00:51:22 -0500 )edit

I slowly understand. I read about tensors a while ago - still a difficult concept to me.

holger gravatar imageholger ( 2018-07-05 01:57:33 -0500 )edit

@ImageCounter Thank you for your effort, actually converting from RGBA to RGB is necessary if channels of mat == 4. So thank you for your hint!

holger gravatar imageholger ( 2018-07-05 08:32:08 -0500 )edit
Login/Signup to Answer

Question Tools

1 follower


Asked: 2018-07-04 08:32:52 -0500

Seen: 1,668 times

Last updated: Jul 05 '18