can I use yolo v4 model in dnn in opencl?
I'm doing some experiment to benchmark the speed of different backend of yolo v4.
my gpu is GeForce GTX 1070 and cpu is Intel Core i9-9900KF CPU
I copied the code from somewhere ,then change the model to yolov4 model from darknet and change the dnn setting
net.setPreferableBackend(cv::dnn:: DNN_BACKEND_CUDA);
net.setPreferableTarget(cv::dnn::DNN_TARGET_CUDA);
the CUDA backend works fine ( about 15 FPS )
now I want to test the opencv backend in cpu and in opencl
in CPU I use:
net.setPreferableBackend(cv::dnn:: DNN_BACKEND_OPENCV);
net.setPreferableTarget(cv::dnn::DNN_TARGET_CPU);
the FPS is about 3~4
and I use opencl
net.setPreferableBackend(cv::dnn:: DNN_BACKEND_OPENCV);
net.setPreferableTarget(cv::dnn::DNN_TARGET_OPENCL);
The result of yolo is right but really slow, the FPS is only 1~2, it shows some error message when run the program:
OpenCV(ocl4dnn): consider to specify kernel configuration cache directory via OPENCV_OCL4DNN_CONFIG_PATH parameter. OpenCL program build log: dnn/dummy Status -11: CL_BUILD_PROGRAM_FAILURE -cl-no-subgroup-ifp Error in processing command line: Don't understand command line argument "-cl-no-subgroup-ifp"!
When I check the nvidia-smi ,the Volatile GPU-Util is about 97% and GPU memory usage is 531MiB, comparing to target CPU the value is 0% so I think the gpu is exactly running , but just in a wrong way.
when I check the clinfo it shows something so I think opencl is installed.
Number of platforms
1Platform Name
NVIDIA CUDAPlatform Vendor
NVIDIA CorporationPlatform Version
OpenCL 1.2 CUDA 11.1.102Platform Profile
FULL_PROFILEPlatform Extensions
cl_khr_global_int32_base_atomicscl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_nv_create_buffer cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_device_uuid
Platform Extensions function suffix NV
Platform Name
NVIDIA CUDANumber of devices
1Device Name
GeForce GTX 1070Device Vendor
NVIDIA CorporationDevice Vendor ID
0x10deDevice Version
OpenCL 1.2 CUDADriver Version
455.32.00Device OpenCL C Version
OpenCL C 1.2Device Type
GPU
I expect that even the nvidia card may not well support the opencl, it would at least faster than the CPU in the neural network architecture, but the experiment shows the the opencl is very slow.
Is it what the opencl works like?