Revision history [back]

If we look at the code closely we can see that everying is dependent on the outputs of detections, line 121, and we should tweak its outputs to match them with the outs of this, line 63. After spending almost a day, I came to a reasonable (not the perfect) solution. Basically, it is all about output blobs of readNetFromCaffe and readFromDarknet, because they output a blob with a shape 1x1xNx7 and NxC, respectively. Here Ns are the number of detections, but with different size vectors, namely, N in 1x1xNx7 is is a number of detections and an every detection is a vector of values [batchId, classId, confidence, left, top, right, bottom] and N in NxC a number of detected objects and C is a number of classes + 4 where the first 4 numbers are [center_x, center_y, width, height]. After analyzing these, we may replace (124-130 lines)

for i in np.arange(0, detections.shape[2]):
confidence = detections[0, 0, i, 2]
if confidence > args["confidence"]:
idx = int(detections[0, 0, i, 1])
if CLASSES[idx] != "person":
continue
box = detections[0, 0, i, 3:7] * np.array([W, H, W, H])
(startX, startY, endX, endY) = box.astype("int")


with equivalent lines

    for i in np.arange(0, detections.shape[0]):
scores = detections[i][5:]
classId = np.argmax(scores)
confidence = scores[classId]
if confidence > args["confidence"]:
idx = int(classId)
if CLASSES[idx] != "person":
continue

center_x = int(detections[i][0] * 416)
center_y = int(detections[i][1] * 416)
width = int(detections[i][2] * 416)
height = int(detections[i][3] * 416)
left = int(center_x - width / 2)
top = int(center_y - height / 2)
right = width + left - 1
bottom = height + top - 1

box = [left, top, width, height]
(startX, startY, endX, endY) = box


This way we can keep track of "person" class using Darknet's cfg and weights and count them up/down with a visualiation line.

Again, there might be some other more simpler ways of tracking the detections of Darknet weights file, but this works for this particular case.