According to the documentation, the function does support multi-channel images hence not just CV_32FC1
as you stated.
Looking into their documentation, it does not seem like they have a GPU version of this method yet. My advise:
- Read their code (or actual paper) to understand what is happening then maybe leverage the GPU for some of the operations
- Research online for an optimised version (or GPU-accelerated one)
- If your task allows, consider resizing your images
- If all fails, consider looking into multi-threading