have been recently hired by a startup and they have given me the task to make opencv program as much computationally optimised as possible using cuda and opencl and record the performance of the program(code) in different scenario.
I have worked on opencv for 3-4 months but have not worked with cuda or opencl built opencv nor I have prior knowledge of cuda or opencl .
Though there are a few examples of cuda and ocl opencv functions, i dont want to just copy and exchange simple functions with cuda functions....i want to aquire deep knowledge of where to use cuda or opencl and were to use cpu for better optimisation.
Can anyone suggest me how to take steps in this field and how to pursue further? Further if we use standalone cuda or opencl libraries instead of built opencl lib is there any case when anyone get better result in former case P.S: I am using opencv 3.2 with c++