# Performance GPU implementation on NVIDIA Jetson TK1

Hello,

I recently have implemented a simple algorithm for GPU that works on the pixels of an image. It performs operations such as additions, subtractions, multiplications, divisions, max, split, merge etc. I am using NVIDIA Jetson TK1 for the experiments. I am comparing the performance of the same algorithm using GPU and CPU (of course I adapted the GPU code for CPU). The image is RGB of size 720x576.

I concluded that the algorithm on GPU is much much slower than that on CPU.

The majority of the time is used for the allocation of the memory. For example, this operation

cv::gpu::GpuMat img_gpu(host_img.size(), CV_8UC3);


takes on average 2 seconds. If I don't allocate the memory first and I do the upload, letting upload allocate the memory (I guess), i.e.

cv::gpu::GpuMat img_gpu;