First time here? Check out the FAQ!

Ask Your Question
0

Is it ok to use existing dataset as training data?

asked Oct 20 '14

CosmicRabbit gravatar image

I'm afraid this may be a little off-topic but...

I'm currently planning an academic research on facial recognition using deep learning. There are many datasets available online, and I've seen that a lot of papers use one of the publicly available datasets as "test data", but is it okay to use an existing dataset as "training data"? Or is it required to construct your own training data for an academic research/paper?

In the former case, is it ok to mix/combine the existing datasets and use it as your training dataset?

Preview: (hide)

1 answer

Sort by » oldest newest most voted
5

answered Oct 20 '14

Using a publicly available data set is perfectly fine and also a good thing to do. If you generate your own (test)data, it's hard to compare your approach to the state of the art as your data set could just be easier than what others have worked on. Which data set do the papers use that you are going to cite? Creating a good data set is a already an academic challenge on its own so I would try to use already available sets. They will probably contain images that you would not have thought of (or just contain images of people with more different ages/weights/ethnities).

It's also ok to mix. In the end, it's completely up to you how you generate the training data. The only thing is that your training and test set are disjoint. The testing should then be done for every test set so that you can compare your performance better to other approaches.

Preview: (hide)

Comments

thank you so much!

CosmicRabbit gravatar imageCosmicRabbit (Oct 20 '14)edit

Question Tools

Stats

Asked: Oct 20 '14

Seen: 298 times

Last updated: Oct 20 '14