draw detections when blobFromImages is used

asked 2019-01-11 13:31:37 -0600

updated 2019-01-11 23:40:05 -0600

berak
32993 ●7 ●81 ●312

Hello,

I was testing OpenCV face detection using a pre-trained model:

(h, w) = image.shape[:2]

net = cv2.dnn.readNetFromCaffe("deploy.prototxt.txt", "res10_300x300_ssd_iter_140000_fp16.caffemodel")
blob = cv2.dnn.blobFromImage(image, 1.0, (300, 300), [104., 117., 123.], False, False)
net.setInput(blob)
detections = net.forward()

for i in range(0, detections.shape[2]):
    confidence = detections[0, 0, i, 2]

    if confidence > 0.7:
        box = detections[0, 0, i, 3:7] * np.array([w, h, w, h])
        (startX, startY, endX, endY) = box.astype("int")

        text = "{:.2f}%".format(confidence * 100)
        y = startY - 10 if startY - 10 > 10 else startY + 10
        cv2.rectangle(image, (startX, startY), (endX, endY), (0, 0, 255), 2)
        cv2.putText(image, text, (startX, y), cv2.FONT_HERSHEY_SIMPLEX, 0.45, (0, 0, 255), 2)

This example is working ok. But I don't now how to modify the code above in order to draw the detections if two images are used instead of only one:

blob2 = cv2.dnn.blobFromImages(images, 1.0, (300, 300), [104., 117., 123.], False, False)
net.setInput(blob2)
detections = net.forward()

How to draw the detections?

Thanks in advanced

edit retag flag offensive close merge delete

Comments

there is something weird here.

if we use cv2.dnn.blobFromImages([im1, im2, im3],...), we get e.g. a detection.shape of [1,1,27,7], in other words, all detections end up in the same batchnum / channel.

and, although all images get resized to the same size, the number of detections per image differs, depending on the image (size ?), which makes it impossible to seperate them per image with multiple ones.

then, it also crashes with the tensorflow uint8 model:

OpenCV(4.0.1-dev) Error: Assertion failed (start <= (int)shape.size() && end <= (int)shape.size() && start <= end) in total, file C:/p/opencv/modules/dnn/include/opencv
2/dnn/shape_utils.hpp, line 161

berak ( 2019-01-11 23:55:29 -0600 )edit

ok; yes, it was the problem, I didn't know how to separate the detections per image. Thanks @berak.

alfonsofernandezvillar ( 2019-01-12 03:51:19 -0600 )edit

wait, we still have to solve it ;(

~~imho, there's something wrong with the detection_out layer.~~

hmm, the PriorBox layers all look like this:
PriorBox i[3, 128, 2, 2] i[3, 3, 96, 128] o[1, 2, 64]
(2 inputs with batchsize 3, but the output has batchsize 1 only)

~~i also tried the MobileNetSSD_deploy model, same problem.~~

berak ( 2019-01-12 04:02:15 -0600 )edit

@berak, There is a specification of output for detection_out layers: [batchId, classId, confidence, left, top, right, bottom] so we need to check [0] element to split output per samples. But, as mentioned, the problem is how to manage number of detections. For different batch size we always get 1x1xNx7 where N is a number of detections (by default 200). To increase it, we can modify .prototxt: keep_top_k: 200. It's just a guess. Please let me check it.

dkurt ( 2019-01-12 04:42:54 -0600 )edit

@dkurt, ah ! indeed the batch id is the 1st element of each row !

berak ( 2019-01-12 04:49:52 -0600 )edit

add a comment

answered 2019-01-12 05:09:04 -0600

berak
32993 ●7 ●81 ●312

updated 2019-01-12 05:11:50 -0600

so, solved with kind help from @dkurt, as always ;)

each detection row looks like: [batchId, classId, confidence, left, top, right, bottom]

given, you start with a list of images:

imgs = [image, image2, image3]
net = cv2.dnn.readNetFromCaffe("face_detector.prototxt", "res10_300x300_ssd_iter_140000_fp16.caffemodel")
blob = cv2.dnn.blobFromImages(imgs, 1.0, (128,96), [104., 117., 123.], False, False)

you have to check the 1st element of each detection, to find out, which image it belongs to:

net.setInput(blob)
detections = net.forward()

for i in range(0, detections.shape[2]):
    imgid = int(detections[0, 0, i, 0]) # here we go !
    confidence = detections[0, 0, i, 2]
    if confidence > 0.7:
        (h, w) = imgs[imgid].shape[:2]  # also, your images will have different sizes !
        box = detections[0, 0, i, 3:7] * np.array([w, h, w, h])
        (startX, startY, endX, endY) = box.astype("int")

        text = "{:.2f}%".format(confidence * 100)
        y = startY - 10 if startY - 10 > 10 else startY + 10
        cv2.rectangle(imgs[imgid], (startX, startY), (endX, endY), (0, 0, 255), 2)
        cv2.putText(imgs[imgid], text, (startX, y), cv2.FONT_HERSHEY_SIMPLEX, 0.45, (0, 0, 255), 2)

for i in range(len(imgs)):
    cv2.imshow("img%d" % i, imgs[i])
cv2.waitKey()

edit flag offensive delete link

Comments

btw, i just found an alternative detection model (seems to be inception v2 based), which also works nicely with opencv's dnn !

berak ( 2019-01-12 06:38:31 -0600 )edit

Thanks for this link @berak.

The "original" repo is:https://github.com/sfzhang15/FaceBoxes.

And the paper is "FaceBoxes: A CPU Real-time Face Detector with High Accuracy".

alfonsofernandezvillar ( 2019-01-13 04:05:49 -0600 )edit

add a comment

draw detections when blobFromImages is used

Comments

1 answer

Comments

Links

Question Tools

Stats

Related questions

draw detections when blobFromImages is used edit

Comments

1 answer

Comments

Links

Question Tools

Stats

Related questions

draw detections when blobFromImages is used