Revision history [back]

@chamath, thank you for pointing an issue! The problem was in prior boxes generation (anchors). Let's consider MultipleGridAnchorGenerator/concat_5 layer of MobileNet-SSD named as ssd_mobilenet_v1_coco_11_06_2017 (some old version) output is

[0.02500001 0.02500001 0.97500002 0.97500002]  // min size
[0.01266029 0.01266029 0.98733974 0.98733974]  // max size
[0.16412428 -0.17175144 0.83587575 1.1717515 ] // aspect ratio 2.0
[-0.17175144 0.16412428 1.1717515  0.83587575] // aspect ratio 0.5
[0.22575861 -0.3227241  0.77424139 1.3227241 ] // aspect ratio 3.0
[-0.32276523 0.22577232 1.32276523 0.77422768] // aspect ratio 0.333

OpenCV produces the following proposals at the corresponding layer:

[0.025      0.025      0.97500002 0.97500002]  // min size
[0.01266027 0.01266027 0.98733968 0.98733968]  // max size
[-0.17175145 0.16412427 1.1717515  0.83587575] // aspect ratio 2.0
[0.16412427 -0.17175145 0.83587575 1.1717515 ] // aspect ratio 0.5
[-0.3227241  0.22575861 1.3227241  0.77424139] // aspect ratio 3.0
[0.22575861 -0.32272416 0.77424139 1.32272422] // aspect ratio 0.333

Note that TensorFlow produces it in [ymin, xmin, ymax, xmax] order but OpenCV in [xmin, ymin, xmax, ymax]. OpenCV can manage it so it's OK.

OpenCV's PriorBox layer is based on origin Caffe-SSD framework (https://github.com/weiliu89/caffe/blob/ssd/src/caffe/layers/prior_box_layer.cpp). It produces anchors for min_size specified. Then for max_size and all the aspect ratios. It looks like TensorFlow followed this rule before some point.

Let's consider the same layers for your model. Perhaps, it was received from newer TensorFlow, right?

[0.02500001 0.02500001 0.97500002 0.97500002]  // min size
[0.16412428 -0.17175144 0.83587575 1.1717515 ] // aspect ratio 2.0
[-0.17175144 0.16412428 1.1717515  0.83587575] // aspect ratio 0.5
[0.22575861 -0.3227241  0.77424139 1.3227241 ] // aspect ratio 3.0
[-0.32276523 0.22577232 1.32276523 0.77422768] // aspect ratio 0.333
[0.01266029 0.01266029 0.98733974 0.98733974]  // max size

As you may see, anchor related to the max_size hyper-parameter moved to the bottom of the list. In short words, OpenCV for the same order of anchors (min-max-ratios) predicts deltas for an updated order (min-ratios-max).

To make it works properly please apply changes from PR https://github.com/opencv/opencv/pull/10676 and use the following text graph: https://gist.github.com/dkurt/fb324b48d1c6230febd1718ee52f7291

Sample:

import numpy as np
import cv2 as cv

cvNet = cv.dnn.readNetFromTensorflow('hat_model/frozen_inference_graph.pb', 'ssd_mobilenet_v1_coco_hat.pbtxt')

img = cv.imread('/home/dkurtaev/Pictures/image5.jpg')
cvNet.setInput(cv.dnn.blobFromImage(img, 1.0/127.5, (300, 300), (127.5, 127.5, 127.5), swapRB=True, crop=False))
cvOut = cvNet.forward()

for detection in cvOut[0,0,:,:]:
    score = float(detection[2])
    if score > 0.5:
        left = detection[3] * img.shape[1]
        top = detection[4] * img.shape[0]
        right = detection[5] * img.shape[1]
        bottom = detection[6] * img.shape[0]
        cv.rectangle(img, (int(left), int(top)), (int(right), int(bottom)), (0, 255, 0))

cv.imshow('img', img)
cv.waitKey()

C:\fakepath\out.png

I'm sorry for delay and thank you for your patience!