To complement @dandelion1124's answer, here is the relevant slide 19:

GPU acceleration: Transparent API

  • same code can run on CPU or GPU – no specialized cv::Canny, ocl::Canny, etc; no recompilation is needed
  • minimal or no changes in the existing code
    • CPU-only processing – no changes required
  • includes the following key components:
    • new data structure UMat
    • simple and robust mechanism for async processing
    • open to extensions: convenient OpenCL wrappers for accelerating user algorithms