I want to use the opencl function atan2(y, x) which can take vectors for x and Y and perform a per-element operation.
Is there a way I can pass in pointers to two Umats (2 dimensional, float) and get a per-element atan2 operation?
This would obviously save having to copy from Gpu to CPU and back.