I have an idea. Try using the edge map. Inside each box, calculate a histogram of oriented gradients. The trees have lots of edges in lots of directions. So that's an easy choice. I expect you could separate the pedestrians and cars and buildings fairly similarly.

Normally for identification you do this for small blocks, like THIS. I think you might be able to get away with just one block per bounding box. Worth a shot anyway.