I'm gathering results from my image detector algorithm. So basically what I do is that, from a set of images (with the size of 320 x 480), I would run a sliding window of 64x128 thru it, and also under a number of predefined scales.
I understand that:
- True Positives = when my detected window overlaps (within defined intersection size / centroid) with the ground-truth (annotated bounding boxes)
- False Positives = when the algorithm gives me positive windows, which are outside of the grond truth.
- False Negatives = when it failed me to give positive window, while the ground truth annotation states that there's an object.
But what about True Negatives ? Are these true negatives all the windows that my classifier gives me negative results ? That sounds weird, since I'm sliding a small window (64x128) by 4 pixels at a time, and I've around 8 different scales used in detection. If I were to do that, then I'd have lots of true negatives per image.
Or do I prepare a set of pure negative images (no objects / human at all), where I just slide thru, and if there's one or more positive detections in each of these images, I'd count it as False Negative, and vice versa ?
Here's an example image (with green rects as the ground truth)