Yes, traincascade can be used to detect objects of one particular type.
The more variable these objects are in shape, the more difficult is the task.
In principle, you could train a single cascade using, as positives, samples of all types of objects you want to detect [in phase 1] and eventually use some method to assign detected objects to one particular class [in phase 2].
For phase 2 you could use even an algorithm that is slower than boosting, as it only has to run over a very few detected objects.
You could make experiments by yourself, but I’m skeptical about the results you could get following this route: there is too much variability among samples in phase 1 and this should slow down the running time and worsen the detection rate. (Anyway, when I can, I always make experiments for my projects to be sure that what I suspect is correct, many times I’ve discovered that thing are different than I imagined.)
The classical way to achieve your goal is to train multiple classifiers, use them in turn for detection over each image and put the results together.
Yes, the detection time will be the sum over all the detection times and the program will be slower.
To reduce the overall detection time, you could use some tricks depending on the particular objects you are detecting.
For example, if you are detecting faces and eyes you could run the detection algorithm for eyes only inside faces. If you are detecting big objects, you could take a larger minimal window size (small windows are the ones that slow down the running time the more, as many sliding windows have to be checked). If you are detecting oranges, you could only run the detection over areas with some particular colours.
A large RAM doesn’t matter. For this purpose you need much a good CPU (number of cores and GHz).