Why OpenCV building is so slow with CUDA?

asked 2012-12-12 02:49:46 -0500

rics

When I completely rebuild OpenCV it takes hours if I use the with CUDA option? Such a block takes long minutes to compile:

My configuration:

  • Core i7 laptop
  • Nvidia Geforce 630M
  • Windows 7 32 bit
  • OpenCV 2.4.3
  • CUDA 5.0
  • MS Visual C++ 2010 Express

I had assumed that it was much faster given my fast CPU. During compilation only 1 out of 8 CPU cores is working, others are almost idle.

What is going on inside the CUDA compilation?

When I tried to stop building it was waiting for some minutes before it really stopped. Compiling a helloword was normal using nvcc. Similarly compiling Nvidia GPU Computing Toolkit's activity trace example with mingw took some seconds only.

So is it normal that GPU compilation of OpenCV takes so long?

answered 2012-12-12 04:57:55 -0500

Vladislav Vinogradov

updated 2012-12-12 05:01:06 -0500

The reasons are the following:

  • Slow compiler
  • Necessity to compile the same code many times for all GPU architectures
  • A lot of templates instantiations in the module to support all possible types, flags, border extrapolation modes, interpolations, kernel sizes, etc.

Compilation only for one architecture is 6x faster. If you don’t need to compile for all architectures, clear CUDA_ARCH_PTX in CMake, and set CUDA_ARCH_BIN accordingly (“3.0” for Kepler, “2.0” for Fermi, etc.). You can find information about your gpu in

OMG why is this information not available anywhere else?

OMG why is this information not available anywhere else?

awesomenesspanda ( 2016-08-24 05:26:52 -0500 )

Running on a 2.3Ghz Xeon, and the CUDA compiles are killing me. This really saves my employer tons of billable hours. :)

elchan gravatar imageelchan ( 2016-09-22 09:58:20 -0500 )edit

how do I clear CUDA_ARCH_PTX and set CUDA_ARCH_BIN? do I just do -DCUDA_ARCH_PTX='' and -DCUDA_ARCH_BIN=30?

Lawb gravatar imageLawb ( 2018-02-11 03:14:23 -0500 )edit

answered 2012-12-14 13:45:37 -0500

solvingPuzzles

I get pretty long compilation times for OpenCV even on an Intel Sandy Bridge server with 64gb of ram. My speculation is that it's a combination of:

  • Lots of C++ templates (see this thread for some insight into why C++ templates can be slow to build).
  • Lots of cuda kernels -- remember, the kernels themselves are not polymorphic (unless you do some really advanced tricks), so OpenCV routines often have one kernel for each data type (CV_8UC1, CV_32FC3, etc). This adds up quickly.
  • As Vladislav said, building for several architectures (Compute 1.0, 1.1, 1.2, 2.0, etc) increases build time, but you can avoid this by just selecting your architecture in the CUDA_ARCH_BIN flag.
  • I think there's also some code generation going on at compile-time. I don't remember the details, but I remember seeing a bunch of printouts about code generation during the OpenCV GPU compilation.

You may have already tried this, but building in multithread mode (e.g. use the flag -j8 for 8 threads, -j16 for 16 threads, pick your favorite number) can help. I've noticed that builds sometimes fail in multithreaded mode, but this may just be coincidence. Anyway, it's worth a try.

answered 2012-12-14 07:02:25 -0500

ubehagelig

updated 2012-12-14 07:03:53 -0500

A couple of somewhat related comments:

I believe TBB (threaded building blocks) would utilise all cores, but maybe that is in run-time and not when compiling. I haven't fiddled with TBB myself yet.

Another thing: Make sure the charger is plugged in. My i7 laptop runs at 1.0 ghz if it is running on battery, but at a full 2.4 ghz on the charger.

The charger was plugged in so that could not be the cause.

rics gravatar imagerics ( 2012-12-21 07:03:04 -0500 )edit
