Revision history [back]

I get pretty long compilation times for OpenCV even on an Intel Sandy Bridge server with 64gb of ram. My speculation is that it's a combination of:

Lots of C++ templates (see this thread for some insight into why C++ templates can be slow to build).
Lots of cuda kernels -- remember, the kernels themselves are not polymorphic (unless you do some really advanced tricks), so OpenCV routines often have one kernel for each data type (CV_8UC1, CV_32FC3, etc). This adds up quickly.
As Vladislav said, building for several architectures (Compute 1.0, 1.1, 1.2, 2.0, etc) increases build time, but you can avoid this by just selecting your architecture in the CUDA_ARCH_BIN flag.
I think there's also some code generation going on at compile-time. I don't remember the details, but I remember seeing a bunch of printouts about code generation during the OpenCV GPU compilation.

You may have already tried this, but building in multithread mode (e.g. use the flag -j8 for 8 threads, -j16 for 16 threads, pick your favorite number) can help. I've noticed that builds sometimes fail in multithreaded mode, but this may just be coincidence. Anyway, it's worth a try.