Revision history [back]

My first guess is that data of classifier one is not used as negative training data for classifier two. It will make it more difficult to converge to a stable model, but it will force the model to look for features that are not similar between both cases.

However, it might already be solvable by looking at the certainty of detections. I am quite sure that the sureness of the respons of model one on cases of model two is much lower than on detections of case one? How about putting a hard threshold there?

So basically

Your first approach might fail, because your inner features might be stronger than the outer rectangle compared to the background, still leading to false positive detections on the rectangular shapes.
Bootstrapping, your second approach is reasonable but double work. It requires you to train the initial model again.
Same for your third suggestion.

In my opinion you just need to use the data you have available.