Ask Your Question

DNN opencv with SSD resnet return wrong face dimension

asked 2018-05-28 19:45:21 -0500

lezan gravatar image

updated 2018-05-28 19:47:25 -0500

Hello, I playing with face and DNN but I cannot figure out of to solve an issue.

I am processing image 256x256. Using deploy.prototxt and res10_300x300_ssd_iter_140000.caffemodel (same one on dnn directory).

Some code.

cv::Mat faceROI;
cv::Mat image;

image = cv::imread(imagePath[imageId], CV_LOAD_IMAGE_COLOR);
cv::Mat imageDNNBlob = cv::dnn::blobFromImage(image, 1.0, cv::Size(300, 300), 
    Scalar(104.0, 177.0, 123.0), false, false);
netOpenCVDNN.setInput(imageDNNBlob, "data");
cv::Mat detection = netOpenCVDNN.forward("detection_out");
cv::Mat faces(detection.size[2], detection.size[3], CV_32F, detection.ptr<float>());
for (int i = 0; i < faces.rows; i++)
    float confidence =<float>(i, 2);
    if (confidence > 0.99)
        int xLeftBottom = static_cast<int>(<float>(i, 3) * image.cols);
        int yLeftBottom = static_cast<int>(<float>(i, 4) * image.rows);
        int xRightTop = static_cast<int>(<float>(i, 5) * image.cols);
        int yRightTop = static_cast<int>(<float>(i, 6) * image.rows);

        cv::Rect faceRect((int)xLeftBottom, (int)yLeftBottom, 
                    (int)(xRightTop - xLeftBottom), (int)(yRightTop - yLeftBottom));
    faceROI = cv::Mat(image, faceRect);

Nothing too exotic, I just write down what I found in resnet_ssd_face.cpp. When I try to extract ROI from image with faceROI = cv::Mat(image, faceRect) I get an error on wrong dimensions with faceRect, in fact (with a particular image) I get 257 as dimension (height).<float>(i, 6)return a float >1.

What I miss? Can some help to figure out?

I have also some questions about this example:

  1. netOpenCVDNN.forward return a Mat, where size[2] is the number of object found, size[3] numbers of property of each object? Am I right? Where can I find more info about what forward return? (Already checked here and here. I think it is related to the layer "detection_out" of prototxt, but I can not get it).
  2. Mat facesis a matrix with all faces found, right? Where each rows is a face detected and each rows (face) have some property (cols), right? So<float>(i, 2) is the confidence of i-th face and from 3 to 4 are dimensions of face. What position 0 and 1 contains?
  3. Why cv::Mat imageDNNBlob have a numbers of rows and cols like -1?
  4. Last one: I am using image of 256x256 dimension. Input layer of dnn use 300x300 as dimension. What is the right solution? Resize image? Change input layer? Is cv::Size(300, 300) right in blobFromImage?

Thanks in advance.

edit retag flag offensive close merge delete



it will get better,once you name things correctly, see here (though it's not your fault, it was already wrong in the (outdated) sample)

also, there's probably a reason for this

berak gravatar imageberak ( 2018-05-28 20:02:50 -0500 )edit

1 answer

Sort by ยป oldest newest most voted

answered 2018-05-29 01:14:17 -0500

berak gravatar image

updated 2018-05-29 01:15:06 -0500

  1. unfortunately, there is no easy answer, it depends on the architecture, and what a network was trained upon. classification networks have a single layer of class predictions here, ssd style detection networks have N rows with 7 numbers, yolo3 ones have "regions" here.

  2. for those ssd detections, position 0 is the detection id (a sequential number), position 1 the classID (unused here, because we don't have cats & dogs, only faces here)

  3. those "blobs" are 4d tensors, and 4 dimensions don't fit into rows & cols, so those are -1, and you have to look at the size array to retrieve that information, size[0]==nImages, size[1]==numChannels, size[2]==H, size[3]==W

  4. yes, it was originally trained on 300x300 images. if you use a smaller one, it will get upscaled automatically. note, that it might get faster (but somewhat less accurate), if you use a smaller size, like (128,96) (used in the js demo)

edit flag offensive delete link more


  1. Is it not enought to check the last layer of network?
  2. Perfect.
  3. Make sense. Thought could be a problem, it was not. Ok.
  4. So I will leave prototxt as is, on blobFromImage I can use cv::Size(256, 256) and without resize image to 300x300, right?
lezan gravatar imagelezan ( 2018-05-29 03:31:59 -0500 )edit

1: mostly. but again, some networks (like yolov3) require you to check more than the last output layer

4: yes.

berak gravatar imageberak ( 2018-05-29 03:35:25 -0500 )edit

Now is working like a charm. Thanks as always berak, your help is precious.

lezan gravatar imagelezan ( 2018-05-29 03:39:02 -0500 )edit

Question Tools

1 follower


Asked: 2018-05-28 19:45:21 -0500

Seen: 721 times

Last updated: May 29 '18