The typical bag of words (or bag of features) works by using one visual vocabulary, i.e. you cluster once for all (or a random selection if they don't fit into memory) features of all your training images. Afterwards you compute for each training image a histogram and then train your classifier, see also for some additional information and links.