Ask Your Question
0

traincascade/boosting non-deterministic?

asked 2015-02-01 09:46:16 -0600

_Martin gravatar image

Hey there,

I was trying to train a cascade classifier using traincascade. I used 2 different machines training on the same data with same parameters, but I seem to get different results. Since in my understanding there is no randomness to training/boosting the cascade, I wondered how i got these results?

Can anyone explain this? Or is there some kind of nondeterminism in the training process? Then I would be happy to get some deeper explanations.

edit retag flag offensive close merge delete

1 answer

Sort by ยป oldest newest most voted
0

answered 2015-02-01 10:34:31 -0600

O that is quite easy to answer. The negative sample grabbing from the negative images is based on random top left corner coordinates. So it could be that other negatives were used. Over a larger number the influence is minimal normally.

edit flag offensive delete link more

Comments

Tahnk you for your quick response, but in my case I made sure the same negative samples are used. So there is no randomness in choosing the negative samples.

There must be another reason! Does somebody know if there's some randomness in the boosting process?

_Martin gravatar image_Martin ( 2015-02-02 03:39:09 -0600 )edit

Can you explain HOW the hell you made sure the exact same negatives were used?

StevenPuttemans gravatar imageStevenPuttemans ( 2015-02-02 03:40:33 -0600 )edit

As to other randomness in the process. Wikipedia says the AdaBoost training process selects only those features known to improve the predictive power of the model, but as I understand many features still allow to increase the predictive power. Is it possible that it randomly selects a feature from that subset?

StevenPuttemans gravatar imageStevenPuttemans ( 2015-02-02 03:57:30 -0600 )edit

I changed the grabbing of the negatives, by using another precropped vec-file as negative sample provider. This way I make sure, the same negatives are used.

I do not know yet where the randomness in training occures. I'll try to dig a little deeper into the code. But if somebody has an idea/explanation, I would be very thankful

_Martin gravatar image_Martin ( 2015-02-02 08:18:41 -0600 )edit

Hmm would you be interested to implement your negative grabbing as a secondary option to the training? It would be a nice addition to the code through a PR! As to the randomness, I still think it will be related to the feature grabbing.

StevenPuttemans gravatar imageStevenPuttemans ( 2015-02-02 08:31:04 -0600 )edit

Where have you found that the top left corner coordinates is random? The source code on traincascade/imagestorage.cpp is clear: the algorithm collects negative windows through a process that varies top-left corners and scales of the negative image, where the samples are taken, in a way that is determined in advance (given the same data).

Gino Strato gravatar imageGino Strato ( 2015-04-02 14:44:49 -0600 )edit

@Gino Strato, that was actually one of my own concerns also lately. I let an experienced user explain me that some time ago. I led to believe this is true since the training of a model with exact same data and parameters twice does not lead to an exact same output!

StevenPuttemans gravatar imageStevenPuttemans ( 2015-04-03 02:22:58 -0600 )edit

@both it seems that the randomness comes from inside the boosting process, as explained in more detail here

StevenPuttemans gravatar imageStevenPuttemans ( 2015-04-03 02:25:07 -0600 )edit

@Gino Strato, let me correct myself. I am in the process of doing research on training data relation to accuracy, and it seems that the latest 2.4 branch doesnt produce different models anymore when retraining... which is just weird for me :D

StevenPuttemans gravatar imageStevenPuttemans ( 2015-04-03 06:44:29 -0600 )edit
1

I think that it depends on the parallelization of the process (maybe just the process of collecting negative images). They should have fixed it somewhere in the meanwhile.

Gino Strato gravatar imageGino Strato ( 2015-04-03 07:21:03 -0600 )edit

Question Tools

1 follower

Stats

Asked: 2015-02-01 09:46:16 -0600

Seen: 261 times

Last updated: Feb 01 '15