Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

If we look at the code closely we can see that everying is dependent on the outputs of detections, line 121, and we should tweak its outputs to match them with the outs of this, line 63. After spending almost a day, I came to a reasonable (not the perfect) solution. Basically, it is all about output blobs of readNetFromCaffe and readFromDarknet, because they output a blob with a shape 1x1xNx7 and NxC, respectively. Here Ns are the number of detections, but with different size vectors, namely, N in 1x1xNx7 is is a number of detections and an every detection is a vector of values [batchId, classId, confidence, left, top, right, bottom] and N in NxC a number of detected objects and C is a number of classes + 4 where the first 4 numbers are [center_x, center_y, width, height]. After analyzing these, we may replace (124-130 lines)

for i in np.arange(0, detections.shape[2]):
    confidence = detections[0, 0, i, 2]
    if confidence > args["confidence"]:
        idx = int(detections[0, 0, i, 1])
        if CLASSES[idx] != "person":
            continue
        box = detections[0, 0, i, 3:7] * np.array([W, H, W, H])
        (startX, startY, endX, endY) = box.astype("int")

with equivalent lines

    for i in np.arange(0, detections.shape[0]):
        scores = detections[i][5:]
        classId = np.argmax(scores)
        confidence = scores[classId]
        if confidence > args["confidence"]:
            idx = int(classId)
            if CLASSES[idx] != "person":
                continue

            center_x = int(detections[i][0] * 416)    
            center_y = int(detections[i][1] * 416)    
            width = int(detections[i][2] * 416)        
            height = int(detections[i][3] * 416)     
            left = int(center_x - width / 2)         
            top = int(center_y - height / 2)
            right = width + left - 1
            bottom = height + top - 1

            box = [left, top, width, height]
            (startX, startY, endX, endY) = box

This way we can keep track of "person" class using Darknet's cfg and weights and count them up/down with a visualiation line.

Again, there might be some other more simpler ways of tracking the detections of Darknet weights file, but this works for this particular case.

A reference: more about blobs output by readNetFromCaffe and readFromDarknet