OpenCV 3.1 Cuda FAST algorithm

Hello, I'd like to use the FAST algorithm on my GPU. The picure is stored in a Mat frame. What I got so far:

cuda::GpuMat gpuImg0;
cuda::GpuMat gpuInput_gray(gpuImg0.size(), gpuImg0.type());
cv::cuda::cvtColor(gpuImg0, gpuInput_gray, COLOR_BGR2GRAY);
cv::Ptr<cv::cuda::FastFeatureDetector> d_fast = cv::cuda::FastFeatureDetector::create(20, 1,cv::FastFeatureDetector::TYPE_9_16);
std::vector<cv::KeyPoint> d_keypoints;
d_fast->detect(gpuInput_gray, gpu_keypoints);

But now I have the keypoints stored in gpu_keypoints. I do not know how to get them back to the CPU or write them in a GpuMat. In OpenCV 2.x i could use the function drawKeypoints, but it's disabled in 3.x. How to do it now?

FastFeatureDetector->detetct gets GpuMat and vector of key points which is std::vector<cv::keypoint> d_keypoints; in your code. These key points are already downloaded from GPU to CPU you can use them directly. If you look at the code you can see that keypoints on the GPU are converted to CPU and what you get is these keypoints stored on CPU. code:</cv::keypoint>

