Hi, I'm training a fabMap algorithm for loop-closing detection in my project. The training comprises of the creation of descriptors, vocabulary and Chow-Liu tree. I have a database with more than 10.000 images. I am working with a pretty good desktop (12 cores doubled-threaded, 32 GB of RAM and a 6 GB Nvidia graphic card), and I'd like to make the most of it when training my system. I am using opencv 3.0, TBB enabled, on a windows 7, 64 bit system.
In order to see how long will the training take, I've down-sampled the database. I set the clusterSize parameter to 0.3 and I started with just 10 images (~20.000 descriptors), and it took about 40 minutes. With a sample of 100 images (~300.000 descriptors) the whole thing took about 30 hours, and I am afraid that with 1000 images (which will render a decent vocabulary) may take 2 months, and I don't want to imagine how long would take the whole database. Just in case you didn't figure it out already, only the extraction of the descriptors is multi-threaded. The clustering and building of the Chow-Liu tree is performed in a single thread.
The thing here is that the cluster() method of BOWMSCTrainer class has 3 nested for
loops, where each depends on the previous one, and even the sizes of the nested loops are dynamically assigned.
So, my question is: is it possible to parallelise somehow the execution of the cluster() method, so that the training of the system take doesn't take ridiculous amounts of time? I've thought of applying openMP pragmas, but I am not sure given the dynamics of the for
loops. Although I am familiar with parallel programming and multi-threading, I am not an expert at all in this field.
Many thanks in advance!