Hi, I'm training a fabMap algorithm for loop-closing detection in my project. The training comprises of the creation of descriptors, vocabulary and Chow-Liu tree. I have a database with more than 10.000 images. I am working with a pretty good desktop (12 cores doubled-threaded, 32 GB of RAM and a 6 GB Nvidia graphic card), and I'd like to make the most of it when training my system. I am using opencv 3.0, TBB enabled, on a windows 7, 64 bit system.

In order to see how long will the training take, I've down-sampled the database. I set the clusterSize parameter to 0.3 and I started with just 10 images (~20.000 descriptors), and it took about 40 minutes. With a sample of 100 images (~300.000 descriptors) the whole thing took about 30 hours, and I am afraid that with 1000 images (which will render a decent vocabulary) may take 2 months, and I don't want to imagine how long would take the whole database. Just in case you didn't figure it out already, only the extraction of the descriptors is multi-threaded. The clustering and building of the Chow-Liu tree is performed in a single thread.

The thing here is that the cluster() method of BOWMSCTrainer class has 3 nested for loops, where each depends on the previous one, and even the sizes of the nested loops are dynamically assigned.

So, my question is: is it possible to parallelise somehow the execution of the cluster() method, so that the training of the system doesn't take ridiculous amounts of time? I've thought of applying openMP pragmas, but I am not sure given the dynamics of the for loops. Although I am familiar with parallel programming and multi-threading, I am not an expert at all in this field.

Many thanks in advance!

