Ask Your Question
2

draw detections when blobFromImages is used

asked 2019-01-11 13:31:37 -0600

alfonsofernandezvillar gravatar image

updated 2019-01-11 23:40:05 -0600

berak gravatar image

Hello,

I was testing OpenCV face detection using a pre-trained model:

(h, w) = image.shape[:2]

net = cv2.dnn.readNetFromCaffe("deploy.prototxt.txt", "res10_300x300_ssd_iter_140000_fp16.caffemodel")
blob = cv2.dnn.blobFromImage(image, 1.0, (300, 300), [104., 117., 123.], False, False)
net.setInput(blob)
detections = net.forward()

for i in range(0, detections.shape[2]):
    confidence = detections[0, 0, i, 2]

    if confidence > 0.7:
        box = detections[0, 0, i, 3:7] * np.array([w, h, w, h])
        (startX, startY, endX, endY) = box.astype("int")

        text = "{:.2f}%".format(confidence * 100)
        y = startY - 10 if startY - 10 > 10 else startY + 10
        cv2.rectangle(image, (startX, startY), (endX, endY), (0, 0, 255), 2)
        cv2.putText(image, text, (startX, y), cv2.FONT_HERSHEY_SIMPLEX, 0.45, (0, 0, 255), 2)

This example is working ok. But I don't now how to modify the code above in order to draw the detections if two images are used instead of only one:

blob2 = cv2.dnn.blobFromImages(images, 1.0, (300, 300), [104., 117., 123.], False, False)
net.setInput(blob2)
detections = net.forward()

How to draw the detections?

Thanks in advanced

edit retag flag offensive close merge delete

Comments

there is something weird here.

if we use cv2.dnn.blobFromImages([im1, im2, im3],...), we get e.g. a detection.shape of [1,1,27,7], in other words, all detections end up in the same batchnum / channel.

and, although all images get resized to the same size, the number of detections per image differs, depending on the image (size ?), which makes it impossible to seperate them per image with multiple ones.

then, it also crashes with the tensorflow uint8 model:

OpenCV(4.0.1-dev) Error: Assertion failed (start <= (int)shape.size() && end <= (int)shape.size() && start <= end) in total, file C:/p/opencv/modules/dnn/include/opencv
2/dnn/shape_utils.hpp, line 161
berak gravatar imageberak ( 2019-01-11 23:55:29 -0600 )edit

ok; yes, it was the problem, I didn't know how to separate the detections per image. Thanks @berak.

alfonsofernandezvillar gravatar imagealfonsofernandezvillar ( 2019-01-12 03:51:19 -0600 )edit

wait, we still have to solve it ;(

imho, there's something wrong with the detection_out layer.

hmm, the PriorBox layers all look like this:

PriorBox    i[3, 128, 2, 2]     i[3, 3, 96, 128]        o[1, 2, 64]

(2 inputs with batchsize 3, but the output has batchsize 1 only)

i also tried the MobileNetSSD_deploy model, same problem.

berak gravatar imageberak ( 2019-01-12 04:02:15 -0600 )edit
1

@berak, There is a specification of output for detection_out layers: [batchId, classId, confidence, left, top, right, bottom] so we need to check [0] element to split output per samples. But, as mentioned, the problem is how to manage number of detections. For different batch size we always get 1x1xNx7 where N is a number of detections (by default 200). To increase it, we can modify .prototxt: keep_top_k: 200. It's just a guess. Please let me check it.

dkurt gravatar imagedkurt ( 2019-01-12 04:42:54 -0600 )edit

@dkurt, ah ! indeed the batch id is the 1st element of each row !

berak gravatar imageberak ( 2019-01-12 04:49:52 -0600 )edit

1 answer

Sort by ยป oldest newest most voted
2

answered 2019-01-12 05:09:04 -0600

berak gravatar image

updated 2019-01-12 05:11:50 -0600

so, solved with kind help from @dkurt, as always ;)

each detection row looks like: [batchId, classId, confidence, left, top, right, bottom]

given, you start with a list of images:

imgs = [image, image2, image3]
net = cv2.dnn.readNetFromCaffe("face_detector.prototxt", "res10_300x300_ssd_iter_140000_fp16.caffemodel")
blob = cv2.dnn.blobFromImages(imgs, 1.0, (128,96), [104., 117., 123.], False, False)

you have to check the 1st element of each detection, to find out, which image it belongs to:

net.setInput(blob)
detections = net.forward()

for i in range(0, detections.shape[2]):
    imgid = int(detections[0, 0, i, 0]) # here we go !
    confidence = detections[0, 0, i, 2]
    if confidence > 0.7:
        (h, w) = imgs[imgid].shape[:2]  # also, your images will have different sizes !
        box = detections[0, 0, i, 3:7] * np.array([w, h, w, h])
        (startX, startY, endX, endY) = box.astype("int")

        text = "{:.2f}%".format(confidence * 100)
        y = startY - 10 if startY - 10 > 10 else startY + 10
        cv2.rectangle(imgs[imgid], (startX, startY), (endX, endY), (0, 0, 255), 2)
        cv2.putText(imgs[imgid], text, (startX, y), cv2.FONT_HERSHEY_SIMPLEX, 0.45, (0, 0, 255), 2)

for i in range(len(imgs)):
    cv2.imshow("img%d" % i, imgs[i])
cv2.waitKey()
edit flag offensive delete link more

Comments

1

btw, i just found an alternative detection model (seems to be inception v2 based), which also works nicely with opencv's dnn !

berak gravatar imageberak ( 2019-01-12 06:38:31 -0600 )edit
1

Thanks for this link @berak.

The "original" repo is:https://github.com/sfzhang15/FaceBoxes.

And the paper is "FaceBoxes: A CPU Real-time Face Detector with High Accuracy".

alfonsofernandezvillar gravatar imagealfonsofernandezvillar ( 2019-01-13 04:05:49 -0600 )edit

Question Tools

1 follower

Stats

Asked: 2019-01-11 13:31:37 -0600

Seen: 1,520 times

Last updated: Jan 12 '19