For a 60x60 sized detection window the traincasade method needs about 2GB RAM. So, with 16 GB RAM you should be able to execute ~6 training methods in parallel, provided a good processor as well. You might want to compile the application on a 64-bit architecture to make better us of your RAM.

Training 500 classifiers will without a doubt take you a very long time either way. You don't have anything to gain (time-wise) by changing to MatLab