What are the best criteria for a training dataset for gender classification?
I'm still trying to train my FaceRecognizer with data so as to make the recognition stable in a camera preview. Right now I'm training it with faces, sorted by gender and pre-processed with a frontalization algorithm, from the WIKI part of the IMDB-WIKI dataset, 700 faces for each gender, 1400 in total.
I do think that they're basically 700 x 1 faces for each gender. That is, 700 people with only one face data for each, and I fear that it won't be enough to guarantee an accurate and stable recognition (right now, when I aim the camera at a face, it just alternates between Male and Female very quickly and/or recognizes the wrong gender).
There was also another question where one of the commenters advocated for increasing the dimensionality of the dataset for each class. Does that mean having faces of, for example, 50 people with 20 different expressions on 20 different angles under 20 different illumination conditions, for each gender? And, if so, can I just use a 3D model synthesizer algorithm (from the same source that published the aforementioned frontalization algorithm) to simulate all of said angles, expressions, and illumination conditions?