Revision history [back]

According to the documentation, the function does support multi-channel images hence not just CV_32FC1 as you stated.

Looking into their documentation, it does not seem like they have a GPU version of this method yet. My advise:

Read their code (or actual paper) to understand what is happening then maybe leverage the GPU for some of the operations
Research online for an optimised version (or GPU-accelerated one)
If your task allows, consider resizing your images
If all fails, consider looking into multi-threading