I have recently faced the same issue. For an application I used Latent SVM xmls which are already provided with OpenCV installation. It seems like there is an argument in the detect() method where we can specify the number of parallel threads for detection. However the detection time doesn't vary with number of threads. If a single model is loaded, the detection time varies from 10-20 sec per frame and things become worse if you load multiple models.
To make it faster, you can aid it with motion detection if your object is not stationary and moves a bit in the video. Once it is aided with motion detection, as per the maximum speed of your object, you can reject false detections using the distance between current and last detection. You can also skip frames in which the motion rectangle is less than the size of your object (this helps a lot).
If there is a region in the frame where your object is expected to be present, you can apply detection only to that region (again would take lesser time).
However, ANYTHING YOU DO, LatentSVM detection samples in OpenCV CANNOT DETECT IN REAL TIME.
Because of this, I am now using HaarTraining to build new models for my objects. And the results are very promising even at stage 8 of training. It is hardly consuming 20-40 ms and giving real time performance. I would suggest if you need real time processing, better switch to HaarTraining.