Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

The reasons are the following:

  • Slow compiler
  • Necessity to compile the same code many times for all GPU architectures
  • A lot of templates instantiations in the module to support all possible types, flags, border extrapolation modes, interpolations, kernel sizes, etc.

Compilation only for one architecture is 6x faster. If you don’t need to compile for all architectures, clear CUDA_ARCH_PTX in CMake, and set CUDA_ARCH_BIN accordingly (“3.0” for Kepler, “3.5” for Super-Kepler with 2880 cores, “2.0” or “2.1” for Fermi, etc.).

The reasons are the following:

  • Slow compiler
  • Necessity to compile the same code many times for all GPU architectures
  • A lot of templates instantiations in the module to support all possible types, flags, border extrapolation modes, interpolations, kernel sizes, etc.

Compilation only for one architecture is 6x faster. If you don’t need to compile for all architectures, clear CUDA_ARCH_PTX in CMake, and set CUDA_ARCH_BIN accordingly (“3.0” for Kepler, “3.5” for Super-Kepler with 2880 cores, “2.0” or “2.1” for Fermi, etc.).etc.). You can find information about your gpu in https://developer.nvidia.com/cuda-gpus