SURF with cuda is very slow
I have implemented SURF with CUDA but when i execute it runs dead slow and my mouse freezes intermittently while it is running. I am not able to figure out is it hardware issue or there something wrong in my code.
void surf_detect::start_detect()
{
// detecting keypoints & computing descriptors
vector<Point2f> scene_corners(4);
vector<Point2f> obj_corners(4);
surf(img1, GpuMat(), keypoints1GPU, descriptors1GPU);
surf(img2, GpuMat(), keypoints2GPU, descriptors2GPU);
cout << "FOUND " << keypoints1GPU.cols << " keypoints on first image" << endl;
cout << "FOUND " << keypoints2GPU.cols << " keypoints on second image" << endl;
// matching descriptors
Ptr<cv::cuda::DescriptorMatcher> matcher = cv::cuda::DescriptorMatcher::createBFMatcher(surf.defaultNorm());
vector<DMatch> matches;
matcher->match(descriptors1GPU, descriptors2GPU, matches);
// downloading results
vector<KeyPoint> keypoints1, keypoints2;
vector<float> descriptors1, descriptors2;
surf.downloadKeypoints(keypoints1GPU, keypoints1);
surf.downloadKeypoints(keypoints2GPU, keypoints2);
surf.downloadDescriptors(descriptors1GPU, descriptors1);
surf.downloadDescriptors(descriptors2GPU, descriptors2);
// drawing the results
vector<DMatch> emptyvec;
drawMatches(Mat(img1), keypoints1, Mat(img2), keypoints2, emptyvec, img_matches,Scalar::all(-1), Scalar::all(-1),
vector<char>(), DrawMatchesFlags::NOT_DRAW_SINGLE_POINTS);
line(img_matches, scene_corners[0] + Point2f(img1.cols, 0), scene_corners[1] + Point2f(img1.cols, 0), Scalar(0, 255, 0), 4);
line(img_matches, scene_corners[1] + Point2f(img1.cols, 0), scene_corners[2] + Point2f(img1.cols, 0), Scalar(0, 255, 0), 4);
line(img_matches, scene_corners[2] + Point2f(img1.cols, 0), scene_corners[3] + Point2f(img1.cols, 0), Scalar(0, 255, 0), 4);
line(img_matches, scene_corners[3] + Point2f(img1.cols, 0), scene_corners[0] + Point2f(img1.cols, 0), Scalar(0, 255, 0), 4);
namedWindow("matches", 0);
imshow("matches", img_matches);
waitKey(30);
}
I have a NVidia 520M graphics card with CUDA support. My sample cuda programs are running absolutely fine. Why is it so slow?
What is the size of your input image ? Did you time Cuda SURF and what is the result against CPU SURF ?
Size of the image is 320X640. Comparing to CPU it comes to a crawl.
CPU - 7-8 fps GPU - 1 fps