### convertMaps not improving remap performance (In fact reducing it with bilinear and cubic)

I am working on multiple live video streams for which I need to use remap function to change the projection of the streams. I read about using convertMaps to improve remap performance which is very important for my application. I think my implementation is correct because I see that I am getting right output after using convertMaps.

Here's the brief implementation
// mappingMat contains the mapping to be used from input to output
Mat mappingMat = Mat(H, W, ~~CV_32FC2);
~~CV_32FC2);

bool nninterpolation = interpolation == NEARESTNEIGHBOR ? true : ~~false;
~~false;

convertMaps(mappingMat, cv::Mat(),
dstMap1, dstMap2,
0, ~~nninterpolation);
~~nninterpolation);

remap(in_mat, out_mat,
dstMap1, dstMap2,
interploation,
BORDER_WRAP);

**With this, I see about 50% performance improvement while using Nearest Neighbor interpolation for remap.
But for Linear and Cubic I see a huge performance hit if I use convertMaps. The performance goes down by 70% or so and cpu utilization almost doubles.**

Does anyone have any idea on what could be the reason for such behavior?

My guess is, opencv uses AVX/SSE for bilinear or cubic interpolation when it uses only remap without convertMaps. But with convertMaps it doesn't use AVX and only using fixed point arithmetic doesn't really help.

Any insights into this would be highly appreciated as I am very keen on improving remap performance.