One of the images should suffice, since the local descriptors of both images won't be very different and thus the global one (the BoW descriptor) won't be very different, too. Choose larger differences (distance, angle of view) and different lighting conditions. See for example the Oxford5k dataset (although there are also several pictures with only subtle changes).