Ask Your Question

Is it ok to use existing dataset as training data?

asked 2014-10-19 22:33:14 -0500

CosmicRabbit gravatar image

I'm afraid this may be a little off-topic but...

I'm currently planning an academic research on facial recognition using deep learning. There are many datasets available online, and I've seen that a lot of papers use one of the publicly available datasets as "test data", but is it okay to use an existing dataset as "training data"? Or is it required to construct your own training data for an academic research/paper?

In the former case, is it ok to mix/combine the existing datasets and use it as your training dataset?

edit retag flag offensive close merge delete

1 answer

Sort by ยป oldest newest most voted

answered 2014-10-20 01:32:16 -0500

Using a publicly available data set is perfectly fine and also a good thing to do. If you generate your own (test)data, it's hard to compare your approach to the state of the art as your data set could just be easier than what others have worked on. Which data set do the papers use that you are going to cite? Creating a good data set is a already an academic challenge on its own so I would try to use already available sets. They will probably contain images that you would not have thought of (or just contain images of people with more different ages/weights/ethnities).

It's also ok to mix. In the end, it's completely up to you how you generate the training data. The only thing is that your training and test set are disjoint. The testing should then be done for every test set so that you can compare your performance better to other approaches.

edit flag offensive delete link more


thank you so much!

CosmicRabbit gravatar imageCosmicRabbit ( 2014-10-20 03:27:57 -0500 )edit

Question Tools


Asked: 2014-10-19 22:33:14 -0500

Seen: 144 times

Last updated: Oct 20 '14