I have been successfully using BruteForceMatcher_GPU to match SIFT descriptors from two images. They are 128 dimensional vectors in gpumat form. I can send approximately 35000 from each set, for a total of around 70000 with no problems. Above that, the GPU will crash (black screen, resets).
I get this error: OpenCV Error: Gpu API call (unknown error) in unknown function, file c:/opencv/modules/gpu/src/cuda/bf_match.cu, line 190.
Line 190 is in this function:
template <int BLOCK_SIZE, typename Dist, typename T, typename Mask>
void match(const DevMem2D_<T>& query, const DevMem2D_<T>* trains, int n, const Mask& mask,
const DevMem2Di& trainIdx, const DevMem2Di& imgIdx, const DevMem2Df& distance,
cudaStream_t stream)
{
const dim3 block(BLOCK_SIZE, BLOCK_SIZE);
const dim3 grid(divUp(query.rows, BLOCK_SIZE));
const size_t smemSize = (3 * BLOCK_SIZE * BLOCK_SIZE) * sizeof(int);
match<BLOCK_SIZE, Dist><<<grid, block, smemSize, stream>>>(query, trains, n, mask, trainIdx.data, imgIdx.data, distance.data);
cudaSafeCall( cudaGetLastError() );
if (stream == 0)
cudaSafeCall( cudaDeviceSynchronize() );
}
Line 190 is the cudaSafeCall( cudaDeviceSynchronize() ); at the end.
I can get around this by breaking my image into smaller slices to keep the descriptor count below that threshold.
I have a GeForce GTX670 with 4GB of memory. When sending 400000 descriptors I will use about 1.2GB of that memory.
The code in my program looks like this:
cv::gpu::BruteForceMatcher_GPU< cv::L2<float> > matcher;
vector<cv::DMatch> matches;
matcher.match(descriptors1GPU, descriptors2GPU, matches);
Any suggestions?