1 | initial version |
My first guess is that data of classifier one is not used as negative training data for classifier two. It will make it more difficult to converge to a stable model, but it will force the model to look for features that are not similar between both cases.
However, it might already be solvable by looking at the certainty of detections. I am quite sure that the sureness of the respons of model one on cases of model two is much lower than on detections of case one? How about putting a hard threshold there?
So basically
In my opinion you just need to use the data you have available.