Is it ok to use existing dataset as training data?

I'm afraid this may be a little off-topic but...

I'm currently planning an academic research on facial recognition using deep learning. There are many datasets available online, and I've seen that a lot of papers use one of the publicly available datasets as "test data", but is it okay to use an existing dataset as "training data"? Or is it required to construct your own training data for an academic research/paper?

In the former case, is it ok to mix/combine the existing datasets and use it as your training dataset?