Weird words in my bag

Hello, I'm using BOWKMeansTrainer to cluster ORB keypoints with FREAK descriptors. This combination of detector and matcher show good results for drawing-style images, which I trying to analyze, however BOWKMeansTrainer makes some problems...

After training, when I process an image with BOWImgDescriptorExtractor and get its image descriptor, I see that often words (clusters) consist of keypoints, which are too far from each other.

image description

Here are highlighted keypoints of one of the words. Of course there are other words, which are localized enough, but I'm curious why it happens when they're not localized?