1 | initial version |
@chamath, thank you for pointing an issue! The problem was in prior boxes generation (anchors).
Let's consider MultipleGridAnchorGenerator/concat_5
layer of MobileNet-SSD named as ssd_mobilenet_v1_coco_11_06_2017
(some old version) output is
[0.02500001 0.02500001 0.97500002 0.97500002] // min size
[0.01266029 0.01266029 0.98733974 0.98733974] // max size
[0.16412428 -0.17175144 0.83587575 1.1717515 ] // aspect ratio 2.0
[-0.17175144 0.16412428 1.1717515 0.83587575] // aspect ratio 0.5
[0.22575861 -0.3227241 0.77424139 1.3227241 ] // aspect ratio 3.0
[-0.32276523 0.22577232 1.32276523 0.77422768] // aspect ratio 0.333
OpenCV produces the following proposals at the corresponding layer:
[0.025 0.025 0.97500002 0.97500002] // min size
[0.01266027 0.01266027 0.98733968 0.98733968] // max size
[-0.17175145 0.16412427 1.1717515 0.83587575] // aspect ratio 2.0
[0.16412427 -0.17175145 0.83587575 1.1717515 ] // aspect ratio 0.5
[-0.3227241 0.22575861 1.3227241 0.77424139] // aspect ratio 3.0
[0.22575861 -0.32272416 0.77424139 1.32272422] // aspect ratio 0.333
Note that TensorFlow produces it in [ymin, xmin, ymax, xmax]
order but OpenCV in [xmin, ymin, xmax, ymax]
. OpenCV can manage it so it's OK.
OpenCV's PriorBox layer is based on origin Caffe-SSD framework (https://github.com/weiliu89/caffe/blob/ssd/src/caffe/layers/prior_box_layer.cpp). It produces anchors for min_size
specified. Then for max_size
and all the aspect ratios. It looks like TensorFlow followed this rule before some point.
Let's consider the same layers for your model. Perhaps, it was received from newer TensorFlow, right?
[0.02500001 0.02500001 0.97500002 0.97500002] // min size
[0.16412428 -0.17175144 0.83587575 1.1717515 ] // aspect ratio 2.0
[-0.17175144 0.16412428 1.1717515 0.83587575] // aspect ratio 0.5
[0.22575861 -0.3227241 0.77424139 1.3227241 ] // aspect ratio 3.0
[-0.32276523 0.22577232 1.32276523 0.77422768] // aspect ratio 0.333
[0.01266029 0.01266029 0.98733974 0.98733974] // max size
As you may see, anchor related to the max_size
hyper-parameter moved to the bottom of the list. In short words, OpenCV for the same order of anchors (min-max-ratios) predicts deltas for an updated order (min-ratios-max).
To make it works properly please apply changes from PR https://github.com/opencv/opencv/pull/10676 and use the following text graph: https://gist.github.com/dkurt/fb324b48d1c6230febd1718ee52f7291
Sample:
import numpy as np
import cv2 as cv
cvNet = cv.dnn.readNetFromTensorflow('hat_model/frozen_inference_graph.pb', 'ssd_mobilenet_v1_coco_hat.pbtxt')
img = cv.imread('/home/dkurtaev/Pictures/image5.jpg')
cvNet.setInput(cv.dnn.blobFromImage(img, 1.0/127.5, (300, 300), (127.5, 127.5, 127.5), swapRB=True, crop=False))
cvOut = cvNet.forward()
for detection in cvOut[0,0,:,:]:
score = float(detection[2])
if score > 0.5:
left = detection[3] * img.shape[1]
top = detection[4] * img.shape[0]
right = detection[5] * img.shape[1]
bottom = detection[6] * img.shape[0]
cv.rectangle(img, (int(left), int(top)), (int(right), int(bottom)), (0, 255, 0))
cv.imshow('img', img)
cv.waitKey()
I'm sorry for delay and thank you for your patience!