Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

Asynchronous performance lacking

When using CUDA to try implement a pipe-lined execution model. timing an asynchronous function call, the measured time is the same as when the synchronous function is called :

cv::cuda::Stream stream1;
cuda::GpuMat img1, descR, KeyP;
img1.upload(im_gray);
t2 = clock();
gpuOrb->detectAndComputeAsync(img1, cv::noArray(), KeyP, descR, false, stream1);
r3 = (clock()- t2)/(double)CLOCKS_PER_SEC;
std::cout << "Asynchronous call : " << r3 << std::endl;
stream1.waitForCompletion();

Could anyone elaborate why this is the case? In theory the time measured for the function call should insignificant compared to the synchronous call, as the device is synchronized after the function call was measured

Asynchronous performance lacking

When using CUDA to try implement a pipe-lined execution model. model, timing an asynchronous function call, call results in the measured time is being the same as when the synchronous function is called :

cv::cuda::Stream stream1;
cuda::GpuMat img1, descR, KeyP;
img1.upload(im_gray);
t2 = clock();
gpuOrb->detectAndComputeAsync(img1, cv::noArray(), KeyP, descR, false, stream1);
r3 = (clock()- t2)/(double)CLOCKS_PER_SEC;
std::cout << "Asynchronous call : " << r3 << std::endl;
stream1.waitForCompletion();

Could anyone elaborate why this is the case? In theory the time measured for the function call should insignificant compared to the synchronous call, as the device is synchronized after the function call was measured