So to answer your questions as posted in my comment
FIRST COMMENT POST
- With vec-file I mean the actual
amount of elements in the vector
file. This vector file is created by
the create_samples.exe application
and contains all positive
detections. Do not misunderstand
this with the amount of images used
for positive samples, it is actually
all single detections. So if you
have multiple detections in each
image, this will result in a single
element for the vector!
- What confuses me is the S factor, I guess that is the actual number of negative elements you assign to the process, but I am not certain there.
- Number of negatives defines how specific you train your classifier for background noise. You should see it as follows. Negative images can have any size you prefer, however by selecting a width and a height factor, you assign the sample size that is randomly picked from these images. For example, having 200 negatives of 500500 pixels and a window size of 4545, will not get you a numNeg = 200 but could give you easily 5000 samples by randomly picking negatives out of the images.
- Take into account that the more negatives and positives you feed to the trainer, the longer it will actually take to reach a result. Each sample needs to be evaluated over and over again using the haar-wavelike feature set, so it is actually a computational expensive thing.
SECOND COMMENT POST
Looking at your second comment, I am wondering if the vector file is actually filled with the correct information. Can you post the command you feed to the create_samples utility? Try to add the -show command and look if the samples actually make sense.
Looking at the error form, it seems you are doing something wrong when storing the actual image information, posting your code could help us to find the problem.
THIRD COMMENT POST
- You are actually doing the correct thing
- It is not stuck, you should take into account the amount of time it takes to train HAAR like wavelet feature cascades. With those number of samples, it wouldn't surprise me if the time to compute it actually was a week or two. It is a slow process of calculating all features each time and evaluating them correctly for each sample.
- If you want to see if your data works out, use the traincascade algorithm with LBP functionality. It actually runs way faster, same training in about a day, and gives you idea about the possibilities.
- Once finished, and good data set is created, you could set a desktop running on the actual HAAR classifier and let it run for multiple days, maybe even weeks.
FOURTH COMMENT POST
- As you see there, it is still processing, please be patient enough and read my comments.
FINAL CONCLUSION
I see to many people sticking to the old haartraining algorithm. Go with the newer traincascade and see if LBP training actually works for your problem. It is way more flexible and will get you on the way much faster.
Go check it out: http://docs.opencv.org/modules/objdetect/doc/cascade_classification.html?highlight=train%20cascade