I'm fairly new to OpenCV, working on a fun personal project -- using a webcam + OpenCV on a Raspberry Pi to detect birds in my fig tree and trigger a scare device. But I'm hitting serious roadblocks trying to train a simple LBP cascade.
I have a simple test app which runs detectMultiScale on a few test images. Using the standard lbpcascade_frontalface.xml the test successfully detects faces within ~200 milliseconds, which is awesome.
Then I took 11 different photos of a square cookie cutter and cropped them to 24x24, obtained 20 random negative images cropped to 200x200, and created 12 test images at 320x240. See the images here.
I then trained the cascade.xml with:
opencv_traincascade -data data -vec vec -bg bg.txt -featureType LBP -w 24 -h 24
and run my detection with:
bird_cascade.detectMultiScale(gray, birds, 1.1, 2, 0, Size(80, 80));
Results:
- the xml file is 390KB, very large compared to lbpcascade_frontalface.xml which is 52KB. Why does this much smaller data set result in a much larger xml file??
- detectMultiScale takes ~42 seconds per image, about 250 times slower than face detection. Again, completely opposite of what I'd expect.
- detectMultiScale doesn't detect any of the star shapes, even on images which contain the exact shapes from the positives training set... rather it always appears to detect a single 104x104 match in the center of the image, regardless of the image :(
Things I've tried in training which haven't made any difference:
- specifying numPos 11 and numNeg 20
- specifying maxFalseAlarmRate of 0.95 (just stabbing in the dark here, haven't found any online docs that explain this very clearly)
- specifying numStages 10 (this actually reduced the xml file size to 220KB and detection time to 33 seconds, but that's still awful)
- using a single star image + opencv_createsamples to produce a vec file with 2000 positives, and supplying 100 negatives. This took over 3 hours to train, but the xml is 505KB and detection takes ~70 seconds, with no successes.
As you can see, I've spent a lot of time but am getting nowhere. I've studied several online tutorials/references but none addresses the total failure I'm encountering.
Any of the following would be EXTREMELY helpful:
- specific insights as to why I'm getting complete failure in my case, and which bits to alter to achieve success.
- a thorough explanation of the LBP algorithm which would give some intuition for dialing in the mysterious training/detection parameters such as numPos, numStages, minHitRate, maxFalseAlarmRate, scaleFactor, minNeighbors (I'm having trouble making sense of this write-up by Maria Dimashova)
- pointer to a good starter tutorial for LBP training -- the standard docs are pretty weak
- I want to start small, but is it even possible to produce a 'test' cascade that doesn't require hundreds or thousands of positives & negatives?
Thanks in advance!
Ken