# Pedestrian detection - strategies for eliminating false positives

I'm using the DefaultPeopleDetector with the SVM detector to detect and track bodies in a video stream. I've played with some of the parameters (ex. the "hit threshold" passed to detectMultiScale), but I'm struggling to figure out how to deal with false positives.

I'm wondering if anyone has suggestions (or links) for strategies that can be used to eliminate false positives?

I'm going to try tracking the persons and looking at the corresponding foundWeights returned from detectMultiScale(). I am hoping that averaging out the weights of a particular tracked body might give an indication as to whether it is a false positive. Does that sound reasonable?

Any pointers in the right direction would be greatly appreciated.

edit retag close merge delete

i have two related post, maybe you would like to take a look at this and this

( 2016-05-25 17:50:02 -0500 )edit

Sort by » oldest newest most voted

The best strategy is generally something like Track Before Detection, where you don't immediately record a true positive. Instead, you watch and track for several frames to make sure the probability of detection is high enough before you record.

An example would be it's only a true track if it's been seen for 10 frames, and at least 50% of the total frames. If you have weights that tell you the probability parts of the object were found, you can incorporate that, say when the sum of the detection probabilities is above a certain threshold and the average probability is above X value.

If you want to look it up, the keywords are Track Before Detection, and there are many variations on the subject.

more

Finally, I did indeed implement Track Before Detection. This has improved things, but only somewhat.

The problem is that:

• With the OpenCV DefaultPeopleDetector, sometimes there are things that randomly get detected as a person (ex the vertical portion of a frame).
• When this happens, even the weight assigned to the detection (returned from detectMultiScale) could be very high
• Sometimes, the same erroneous detection will take place over several frames, tricking the Track Before Detection code into thinking it's real.

It's puzzling because for 90% of the frames, that same erroneous detection won't occur, but then sometimes it occurs in bursts of frames with high weight. I guess this shouldn't be too surprising due to changing lighting conditions, etc, but still seems weird.

( 2016-06-22 11:28:17 -0500 )edit

Another refinement you can add is that new tracks only appear at the edges of your image. I don't know if that's true, but for many cases, it is. Alternatively, new tracks are always moving, so if the motion is small (because it's a false alarm) then it gets ignored.

( 2016-06-22 13:47:26 -0500 )edit

I still feel like the, ultimately, the body detection will need to be improved over the DefaultPeopleDetector, but your suggestions are nevertheless worth implementing. Thanks again Tetragramm.

( 2016-06-28 10:05:27 -0500 )edit

Hi, I think this post may help you. It discusses the parameters in detail:

http://www.pyimagesearch.com/2015/11/...

Sure, it's openCV with Python, but the used parameters should be the same... Johannes

more

Does it? Because it doesn't even mention the integrated non-maxima supression algorithm, and thus several parameters are omitted. Moreover, it has incorrect info as "The size of the sliding window is fixed at 32 x 128 pixels, as suggested by the seminal Dalal and Triggs paper", and also, among others, it avoids explaining the cons of resizing the input image.

( 2016-05-25 08:15:31 -0500 )edit

@LorenaGdL what's your opinion of the integrated NMS algo? I've seen several suggestions that it is not the way to go. You disagree?

( 2016-05-25 08:19:47 -0500 )edit

@logidelic - I tend to use it and I've had no issues, as long as you know what you're doing of course. Surely there are more advanced algorithms, probably better than the integrated one, but since it is already there, why not try it first? Explaining the detectMultiScale function and skipping that part seems quite absurd to me, specially when the default input arguments do force an internal NMS (i.e. if the user do not change them, he's doing a NMS probably without notice).

( 2016-05-25 08:40:20 -0500 )edit

Lorena, you should read the article in depth - of course the non-maxima supression is expressed in detail, also with code provided. Nevertheless I think the article can give some hints how to improve the results.

( 2016-05-25 10:00:08 -0500 )edit

@JohannesZ the integrated NMS algorithm is not explained, but an alternative one. I'm not questioning the quality of such alternative, just saying that not all the parameters of the HOGDescriptor not mentioned (if I'm wrong, please correct me and point me to where the finalThreshold and useMeanShifGrouping params are explained), exactly those who might help in avoiding FPs. I agree, it gives some hints, but it also omits important information (e.g., as I said, it seems that resizing input image to a lower size is all pros, but it also has its unmentioned cons), and contains misinformation (more examples: "The detectMultiScale method constructs [...] a sliding window step size of (4, 4)" -> false; if winStride is omitted, winStride = blockStride (whose default is 8,8) )

( 2016-05-25 10:22:46 -0500 )edit