can I use yolo v4 model in dnn in opencl?

asked 2020-11-06 02:07:00 -0600

gino0717 gravatar image

I'm doing some experiment to benchmark the speed of different backend of yolo v4.

my gpu is GeForce GTX 1070 and cpu is Intel Core i9-9900KF CPU

I copied the code from somewhere ,then change the model to yolov4 model from darknet and change the dnn setting

net.setPreferableBackend(cv::dnn:: DNN_BACKEND_CUDA);

the CUDA backend works fine ( about 15 FPS )

now I want to test the opencv backend in cpu and in opencl

in CPU I use:

net.setPreferableBackend(cv::dnn:: DNN_BACKEND_OPENCV);

the FPS is about 3~4

and I use opencl

net.setPreferableBackend(cv::dnn:: DNN_BACKEND_OPENCV);

The result of yolo is right but really slow, the FPS is only 1~2, it shows some error message when run the program:

OpenCV(ocl4dnn): consider to specify kernel configuration cache directory via OPENCV_OCL4DNN_CONFIG_PATH parameter. OpenCL program build log: dnn/dummy Status -11: CL_BUILD_PROGRAM_FAILURE -cl-no-subgroup-ifp Error in processing command line: Don't understand command line argument "-cl-no-subgroup-ifp"!

When I check the nvidia-smi ,the Volatile GPU-Util is about 97% and GPU memory usage is 531MiB, comparing to target CPU the value is 0% so I think the gpu is exactly running , but just in a wrong way.

when I check the clinfo it shows something so I think opencl is installed.

Number of platforms

Platform Name

Platform Vendor
NVIDIA Corporation

Platform Version
OpenCL 1.2 CUDA 11.1.102

Platform Profile

Platform Extensions

cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_nv_create_buffer cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_device_uuid

Platform Extensions function suffix NV

Platform Name

Number of devices

Device Name
GeForce GTX 1070

Device Vendor
NVIDIA Corporation

Device Vendor ID

Device Version
OpenCL 1.2 CUDA

Driver Version

Device OpenCL C Version
OpenCL C 1.2

Device Type

I expect that even the nvidia card may not well support the opencl, it would at least faster than the CPU in the neural network architecture, but the experiment shows the the opencl is very slow.

Is it what the opencl works like?

edit retag flag offensive close merge delete