I know that SIFT is considered (according to many paper) the most accurate detector & descriptor available. Btw, it's one the slowest solutions also.
So, there is any parallel implementation (using MPI or whatever) or any GPU implementation?
Consider that the algorithm will be used on one image only. Now, I'm not an expert about GPUs, but from what they told me, the time in order to initialize the GPU processes and other preparation steps is longer than the serial SIFT version...and so the GPU version could be useless! That's why I'm considering a parallel version (exploiting CPUs insted of GPU).
Any other parallel implementation of feature detectors/descriptors is well accepted.