Unable to build opencv 3.3.0 with cuda 9.0 on linux
I was unable to build opencv 3.3.0 with Cuda 9.0 on linux until I followed the directions in my "solution" I posted.
The problems were:
1. CUDA_nppi_LIBRARY not being set correctly when running cmake. 2. Compiling fails due to: nvcc fatal : Unsupported gpu architecture 'compute_20' 3. saturate_cast.hpp(277): error: identifier "__half2float" is undefined.
Original question about cmake not detecting Cuda 8.0 was due to an incomplete install of Cuda 8.0 on my system.
Original question is below.
Hello,
I have software I have written and profiled and now it's time for me to move some of the hot spots to the GPU. My previous installation of opencv was not build with CUDA, so I uninstalled it and obtained the 3.3.0 source. This is on linux.
I have an NVIDIA Quadro M1200 with Cuda 8.0.
$ nvidia-smi Wed Sep 27 08:54:18 2017 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 384.69 Driver Version: 384.69 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Quadro M1200 Off | 00000000:01:00.0 Off | N/A | | N/A 46C P8 N/A / N/A | 1145MiB / 4042MiB | 18% Default | +-------------------------------+----------------------+----------------------+
$ nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2016 NVIDIA Corporation Built on Tue_Jan_10_13:22:03_CST_2017 Cuda compilation tools, release 8.0, V8.0.61
I have cuda-8.0 installed at /usr/local/cuda-8.0 with a symbolic link to it from /usr/local/cuda
I invoked cmake with the following arguments:
$ cmake -D CMAKE_BUILD_TYPE=RELEASE -D CMAKE_INSTALL_PREFIX=/usr/local -D WITH_CUDA=ON -D CUDA_FAST_MATH=1 -D WITH_CUBLAS=ON -D WITH_TBB=ON -D WITH_V4L=ON -D WITH_QT=ON -D WITH_OPENGL=ON ..
And the output of cmake shows that CUDA is unavailable:
-- Use Cuda: NO
What can I do to get opencv 3.3.0 building with gpu support? Thank you.
Full output of cmake below, although I don't see anything telling.
-- Detected version of GNU GCC: 54 (504) -- Performing Test HAVE_CXX11 (check file: cmake/checks/cxx11.cpp) -- Performing Test HAVE_CXX11 - Failed -- Found PythonInterp: /usr/bin/python2.7 (found suitable version "2.7.12", minimum required is "2.7") -- Found PythonLibs: /usr/lib/x86_64-linux-gnu/libpython2.7.so (found suitable exact version "2.7.12") -- Found PythonInterp: /usr/bin/python3 (found suitable version "3.5.2", minimum required is "3.4") -- Found PythonLibs: /usr/lib/x86_64-linux-gnu/libpython3.5m.so (found suitable exact version "3.5.2") -- Looking for ccache - not found -- Performing Test HAVE_CXX_FSIGNED_CHAR -- Performing Test HAVE_CXX_FSIGNED_CHAR - Success -- Performing Test HAVE_C_FSIGNED_CHAR -- Performing Test HAVE_C_FSIGNED_CHAR - Success -- Performing Test HAVE_CXX_W -- Performing Test HAVE_CXX_W - Success -- Performing Test HAVE_C_W -- Performing Test HAVE_C_W - Success -- Performing Test HAVE_CXX_WALL -- Performing Test HAVE_CXX_WALL - Success -- Performing Test HAVE_C_WALL -- Performing Test HAVE_C_WALL - Success -- Performing Test HAVE_CXX_WERROR_RETURN_TYPE -- Performing Test HAVE_CXX_WERROR_RETURN_TYPE - Success -- Performing Test HAVE_C_WERROR_RETURN_TYPE -- Performing Test HAVE_C_WERROR_RETURN_TYPE - Success -- Performing Test HAVE_CXX_WERROR_NON_VIRTUAL_DTOR -- Performing Test HAVE_CXX_WERROR_NON_VIRTUAL_DTOR - Success -- Performing Test HAVE_C_WERROR_NON_VIRTUAL_DTOR -- Performing Test HAVE_C_WERROR_NON_VIRTUAL_DTOR - Success -- Performing Test HAVE_CXX_WERROR_ADDRESS -- Performing Test HAVE_CXX_WERROR_ADDRESS - Success -- Performing Test HAVE_C_WERROR_ADDRESS -- Performing Test HAVE_C_WERROR_ADDRESS - Success -- Performing Test HAVE_CXX_WERROR_SEQUENCE_POINT ...
Have you delete your cmakecache.txt?
Ok. I deleted my CMakeCache.txt file and edited my post to show the full output. The output still shows Cuda as unavailable.
I dug into the FindCuda.cmake script. I found the CUDA_npp_LIBRARY variable was not being set. Not finding the npp* libraries, I suspected an incomplete install of CUDA. I uninstalled it completely and installed Cuda 9.0. Still fails. It seems in Cuda 9.0 the nppi library has been split into many different libraries: nppial nppicc nppicom etc .... After hacking the FindCuda.cmake script so that the CUDA_nppi_LIBRARY variable is set to a semicolon separated string of the nppi* libraries, cmake runs successfully with HAVE_CUDA = TRUE.
Compiling fails due to: nvcc fatal : Unsupported gpu architecture 'compute_20'
This architecture seems to be no longer supported in Cuda 9.0, but my video card is the sm50 arch. How do I modify the Makefile / cmake scripts to build with Cuda 9.0?
I dug into the OpenCVDetectCUDA.cmake script and found I should specify the CUDA_GENERATION variable to cmake. Yielding:
/home/mfisher/opencv-3.3.0/modules/cudev/include/opencv2/cudev/util/saturate_cast.hpp(277): error: identifier "__half2float" is undefined.
This function is defined in cuda_fp16.h. Not seeing where cuda_fp16.h is included, I included it from opencv2/cudev/common.h.
Yielding the following linker errors. So something is still wrong with my generated Makefile. [ 42%] Linking CXX executable ../../bin/opencv_test_cudafilters ../../lib/libopencv_cudafilters.so.3.3.0: undefined reference to `nppiErode_8u_C1R' ...
I have: -- CUDA_npp_LIBRARY /usr/local/cuda/lib64/libnppc.so /usr/local/cuda/lib64/libnppial.so /usr/local/cuda/lib64/libnppicc.so /usr/local/cuda/lib64/libnppicom.so /usr/local/cuda/lib64/libnppidei.so /usr/local/cuda/lib64/libnppif.so /usr/local/cuda/lib64/libnppig.so /usr/local/cuda/lib64/libnppist.so /usr/local/cuda/lib64/libnppisu.so /usr/local/cuda/lib64/libnppitc.so /usr/local/cuda/lib64/libnpps.so
Was missing nppim library when I set the CUDA_npp_LIBRARY variable. It builds now. I'll test it tomorrow. But I am scared. I get a lot of warnings of the form:
/home/mfisher/opencv-3.3.0/modules/cudev/include/opencv2/cudev/warp/detail/../../warp/shuffle.hpp(374): warning: function "__shfl_xor(int, int, int)" /usr/local/cuda/include/sm_30_intrinsics.hpp(221): here was declared deprecated ("__shfl_xor() is deprecated in favor of __shfl_xor_sync() and may be removed in a future release (Use -Wno-deprecated-declarations to suppress this warning).")
Thanks a lot, mfisher. Now, I can finally build OpenCV with CUDA.
@Amaro@mfisher could you please share the solution i'm having the sale issue with nvcc fatal "compute_20"?
Cuda 9.0 does not support the sm_20 architecture anymore. When you build openCV, run cmake with CUDA_GENERATION set to your target architecture.
Yes you're right CUDA9 doesn''t support Fermi anymore