gender classification

asked 2016-10-14 16:23:48 -0500

atv gravatar image

updated 2016-10-14 16:51:58 -0500

I had fixed the eigenvalue 0.000000 thing i had the other day. Or so i thought. It seems to come up randomly when training a model. I don't know what is causing that to appear, i wish i did.

I used to think i needed an even amount of pictures, but this doesn't seem to be the case. Sometimes i need to delete, sometimes i need to make it even to get good results. Removal of one picture can influence the outcome of the training. If i don't get it right the value oscillates between male or female genders.

I currently have 600 female pictures, 290 male pictures, all aligned. Gender classification works very well. But yesterday i had an even better model, but somehow i deleted that (figures).

Anyway i guess what i'm trying to say is, i wish i had a better grasp on what influences this. As i'd want to build an even better model, with more pictures. But as long as i don't know what causes the 0.00000 or wild oscillation between labels, there's not much point. It's not the ratio of pictures, it seems, so i guess that leaves the type of pictures.

Is there something messing up my model? There's a thumbnail of someone, which is black when in thumbnail but when i open it , it shows fine. Must be a jpg thing (i save them as png when realigning). Face is also extracted and landmarks applied. So that can't be it.

PS this is not haartraining, i'm use the facerecognizer model->train.

edit retag flag offensive close merge delete


i hope, you don't mind me reopening this, may i ask, how you setup your train / test images ?

if your model turns out to be "biased", and that bias is fluctuating wildly, then i'd blame your process there, e.g. just splitting your data in half, and always using the same for test and the other for train is pretty likely to develop such a bias. (or even give you a false sense of accuracy, if it "seems" to work)

did you try to do some "cross-validation" ? like splitting it up into 10 sets, and using 1 for testing and the other 9 for training, and then rotating it, so you had 10 tests, and each subset was used once for testing.

far easier to spot outliers / bad input this way !

berak gravatar imageberak ( 2016-10-15 03:23:33 -0500 )edit

I'll try that.

Well i have trained some good models now. One with female bias (but that's because there's much more female pictures in the set) and 2 which are neutral, 50-50. They are quite good.

Why would a prediction come up as 0.00000 though? I can't seem to put my finger on it what causes this. Hence my question about randomness in the other topic.

atv gravatar imageatv ( 2016-10-15 03:46:27 -0500 )edit