Revision history [back]

Using a publicly available data set is perfectly fine and also a good thing to do. If you generate your own (test)data, it's hard to compare your approach to the state of the art as your data set could just be easier than what others have worked on. Which data set do the papers use that you are going to cite? Creating a good data set is a already an academic challenge on its own so I would try to use already available sets. They will probably contain images that you would not have thought of (or just contain images of people with more different ages/weights/ethnities).

It's also ok to mix. In the end, it's completely up to you how you generate the training data. The only thing is that your training and test set are disjoint. The testing should then be done for every test set so that you can compare your performance better to other approaches.