Revision history [back]

So to answer your questions as posted in my comment

FIRST COMMENT POST

With vec-file I mean the actual amount of elements in the vector file. This vector file is created by the create_samples.exe application and contains all positive detections. Do not misunderstand this with the amount of images used for positive samples, it is actually all single detections. So if you have multiple detections in each image, this will result in a single element for the vector!
What confuses me is the S factor, I guess that is the actual number of negative elements you assign to the process, but I am not certain there.
Number of negatives defines how specific you train your classifier for background noise. You should see it as follows. Negative images can have any size you prefer, however by selecting a width and a height factor, you assign the sample size that is randomly picked from these images. For example, having 200 negatives of 500500 pixels and a window size of 4545, will not get you a numNeg = 200 but could give you easily 5000 samples by randomly picking negatives out of the images.
Take into account that the more negatives and positives you feed to the trainer, the longer it will actually take to reach a result. Each sample needs to be evaluated over and over again using the haar-wavelike feature set, so it is actually a computational expensive thing.

SECOND COMMENT POST

Looking at your second comment, I am wondering if the vector file is actually filled with the correct information. Can you post the command you feed to the create_samples utility? Try to add the -show command and look if the samples actually make sense.
Looking at the error form, it seems you are doing something wrong when storing the actual image information, posting your code could help us to find the problem.

THIRD COMMENT POST

You are actually doing the correct thing
It is not stuck, you should take into account the amount of time it takes to train HAAR like wavelet feature cascades. With those number of samples, it wouldn't surprise me if the time to compute it actually was a week or two. It is a slow process of calculating all features each time and evaluating them correctly for each sample.
If you want to see if your data works out, use the traincascade algorithm with LBP functionality. It actually runs way faster, same training in about a day, and gives you idea about the possibilities.
Once finished, and good data set is created, you could set a desktop running on the actual HAAR classifier and let it run for multiple days, maybe even weeks.

FOURTH COMMENT POST

As you see there, it is still processing, please be patient enough and read my comments.

FINAL CONCLUSION

I see to many people sticking to the old haartraining algorithm. Go with the newer traincascade and see if LBP training actually works for your problem. It is way more flexible and will get you on the way much faster.

Go check it out: http://docs.opencv.org/modules/objdetect/doc/cascade_classification.html?highlight=train%20cascade