I think your third approach is actually one of the best ways. Two steps for improvement:
- Use much more negatives (typically also much more than positives) for the cascade-classifier.
- Try LBP / HOG instead of HAAR. HOG is actually kinda the state of the art for object recognition (maybe not the OpenCV version though since it has several restrictions in window size etc).
Finally, you could try try to train your own LatentSVM model (you'll find hints in this Q&A forum if you search for it).