Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

StereoBM CPU vs CUDA vs OpenCL Performance

Hi -

I'm wondering if anyone has similar experience with StereoBM with a mobile GPU. I ran StereoBM on a 640x512 image pair, with a 9x9 window, and for 128 disparities. Tested with a CPU, CUDA, and OpenCL versions on i7, GTX780, and K4000M. All times include memory transfers and GPU warm up times were removed.

  • Core i7: 6.368 msec
  • GTX780: OCL: 3.2 msec
  • GTX780: CUDA: 5.3 msec
  • K4000M: OCL: 10.3 msec
  • K4000M: CUDA: 12.4 msec

Two trends that were a bit surprising:

  1. notebook ran slower
  2. OpenCL ran faster than CUDA

Just wondering if any had some insights. For #1, SSE and highly optimized code proves to be the difference? For #2, the basic code for CUDA and OCL implementations look equivalent so it is just the OCL compiler doing something different? Or does this mean that the CUDA implementation could be further optimized?

Thanks