Hi,
Whilst I don't have results for interpolation type CV_INTER_AREA, as I only have results from the standard performance tests which don't include this interpolation type. I can highlight a few things to be aware of.
- The CPU resize operations are fast because they are highly optimized using Intel IPP's.
- I know this is obvious but the performance gain will depend on which CPU and GPU you are using and how they compare in performance to each other.
- The performance increase from the CPU to the GPU is highly dependent on the image type, interpolation method, scale, and to a lesser extent on the image size. Below is an image showing the speedup in the resize operation when going from an i5_6500 to a 1060(less than half as powerful as a single P100).The original image size is 1080p and the scale is 0.5. As you can see there is a speed increase of 6.27 when resizing a 1080p by 0.5 using INTER_LINEAR, for CV_32F BGR images, however there is no speed up at all for CV_8U, Gray images.
- You are using streams, without any synchronization therefore I don't think your timing function is working as you expect. I would try timing without streams to start with or include cudaDeviceSynchronize() after d_src.upload(src,stream); and cuda::resize(d_src, d_dst, Size(400, 400),0,0, CV_INTER_AREA,stream); to ensure that you are only timing the resize operation.
I hope this points you in the right direction.