I cannot find the exact source where the two variations are implemented, but usually, the Hamming distance algorithm relies on two steps: an XOR operation between two bit vectors, and a population count on the resulting bit vector.
The XOR is pretty straight-forward on any type of architecture, but the popcount algorithm can sometimes be speeded up by using processor-specific instruction sets (such as SSE2/SSSE3/...). When these instruction sets are not available/supported, it is also possible to use a look-up table in an attempt to speed up the 'naive' algorithm.
If, in your case, the regular Hamming variation is faster, I would assume your OpenCV bins were built with SSE2/SSSE2/... support enabled, and the algorithm defaults to the accelerated version. On low-end processors, the LUT version is usually faster (although it always depends on the LUT size and the bit vector length).
Info on Hamming dist: http://en.wikipedia.org/wiki/Hamming_distance
Info on 'popcount' algorithms and accelerated implementations: http://wm.ite.pl/articles/sse-popcount.html
LUT typically stands for "look-up table" which may explain the speed gain. Don't know if there's a difference...