Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

BruteForceMatcher_GPU error on large set of descriptors

I have been successfully using BruteForceMatcher_GPU to match SIFT descriptors from two images. They are 128 dimensional vectors in gpumat form. I can send approximately 35000 from each set, for a total of around 70000 with no problems. Above that, the GPU will crash (black screen, resets).

I get this error: OpenCV Error: Gpu API call (unknown error) in unknown function, file c:/opencv/modules/gpu/src/cuda/bf_match.cu, line 190.

Line 190 is in this function:

 template <int BLOCK_SIZE, typename Dist, typename T, typename Mask>
    void match(const DevMem2D_<T>& query, const DevMem2D_<T>* trains, int n, const Mask& mask,
               const DevMem2Di& trainIdx, const DevMem2Di& imgIdx, const DevMem2Df& distance,
               cudaStream_t stream)
    {
        const dim3 block(BLOCK_SIZE, BLOCK_SIZE);
        const dim3 grid(divUp(query.rows, BLOCK_SIZE));

        const size_t smemSize = (3 * BLOCK_SIZE * BLOCK_SIZE) * sizeof(int);

        match<BLOCK_SIZE, Dist><<<grid, block, smemSize, stream>>>(query, trains, n, mask, trainIdx.data, imgIdx.data, distance.data);
        cudaSafeCall( cudaGetLastError() );

        if (stream == 0)
            cudaSafeCall( cudaDeviceSynchronize() );
    }

Line 190 is the cudaSafeCall( cudaDeviceSynchronize() ); at the end.

I can get around this by breaking my image into smaller slices to keep the descriptor count below that threshold.

I have a GeForce GTX670 with 4GB of memory. When sending 400000 descriptors I will use about 1.2GB of that memory.

The code in my program looks like this:

cv::gpu::BruteForceMatcher_GPU< cv::L2<float> > matcher; 

vector<cv::DMatch> matches;
matcher.match(descriptors1GPU, descriptors2GPU, matches);

Any suggestions?

click to hide/show revision 2
Added additional error location

BruteForceMatcher_GPU error on large set of descriptors

I have been successfully using BruteForceMatcher_GPU to match SIFT descriptors from two images. They are 128 dimensional vectors in gpumat form. I can send approximately 35000 from each set, for a total of around 70000 with no problems. Above that, the GPU will crash (black screen, resets).

I get this error: OpenCV Error: Gpu API call (unknown error) in unknown function, file c:/opencv/modules/gpu/src/cuda/bf_match.cu, line 190.

Line 190 is in this function:

 template <int BLOCK_SIZE, typename Dist, typename T, typename Mask>
    void match(const DevMem2D_<T>& query, const DevMem2D_<T>* trains, int n, const Mask& mask,
               const DevMem2Di& trainIdx, const DevMem2Di& imgIdx, const DevMem2Df& distance,
               cudaStream_t stream)
    {
        const dim3 block(BLOCK_SIZE, BLOCK_SIZE);
        const dim3 grid(divUp(query.rows, BLOCK_SIZE));

        const size_t smemSize = (3 * BLOCK_SIZE * BLOCK_SIZE) * sizeof(int);

        match<BLOCK_SIZE, Dist><<<grid, block, smemSize, stream>>>(query, trains, n, mask, trainIdx.data, imgIdx.data, distance.data);
        cudaSafeCall( cudaGetLastError() );

        if (stream == 0)
            cudaSafeCall( cudaDeviceSynchronize() );
    }

Line 190 is the cudaSafeCall( cudaDeviceSynchronize() ); at the end.

Occasionally it will point to an error here, instead:

void copy(const GpuMat& src, Mat& dst) const
    {
        cudaSafeCall( cudaMemcpy2D(dst.data, dst.step, src.data, src.step, src.cols * src.elemSize(), src.rows, cudaMemcpyDeviceToHost) );
    }

I can get around this by breaking my image into smaller slices to keep the descriptor count below that threshold.

I have a GeForce GTX670 with 4GB of memory. When sending 400000 descriptors I will use about 1.2GB of that memory.

The code in my program looks like this:

cv::gpu::BruteForceMatcher_GPU< cv::L2<float> > matcher; 

vector<cv::DMatch> matches;
matcher.match(descriptors1GPU, descriptors2GPU, matches);

Any suggestions?

BruteForceMatcher_GPU error on large set of descriptors

I have been successfully using BruteForceMatcher_GPU to match SIFT descriptors from two images. They are 128 dimensional vectors in gpumat form. I can send approximately 35000 from each set, for a total of around 70000 with no problems. Above that, the GPU will crash (black screen, resets).

I get this error: OpenCV Error: Gpu API call (unknown error) in unknown function, file c:/opencv/modules/gpu/src/cuda/bf_match.cu, line 190.

Line 190 is in this function:

 template <int BLOCK_SIZE, typename Dist, typename T, typename Mask>
    void match(const DevMem2D_<T>& query, const DevMem2D_<T>* trains, int n, const Mask& mask,
               const DevMem2Di& trainIdx, const DevMem2Di& imgIdx, const DevMem2Df& distance,
               cudaStream_t stream)
    {
        const dim3 block(BLOCK_SIZE, BLOCK_SIZE);
        const dim3 grid(divUp(query.rows, BLOCK_SIZE));

        const size_t smemSize = (3 * BLOCK_SIZE * BLOCK_SIZE) * sizeof(int);

        match<BLOCK_SIZE, Dist><<<grid, block, smemSize, stream>>>(query, trains, n, mask, trainIdx.data, imgIdx.data, distance.data);
        cudaSafeCall( cudaGetLastError() );

        if (stream == 0)
            cudaSafeCall( cudaDeviceSynchronize() );
    }

Line 190 is the cudaSafeCall( cudaDeviceSynchronize() ); at the end.

Occasionally it will point to an error here, instead:instead,in gpumat.cpp:

void copy(const GpuMat& src, Mat& dst) const
    {
        cudaSafeCall( cudaMemcpy2D(dst.data, dst.step, src.data, src.step, src.cols * src.elemSize(), src.rows, cudaMemcpyDeviceToHost) );
    }

I can get around this by breaking my image into smaller slices to keep the descriptor count below that threshold.

I have a GeForce GTX670 with 4GB of memory. When sending 400000 descriptors I will use about 1.2GB of that memory.

The code in my program looks like this:

cv::gpu::BruteForceMatcher_GPU< cv::L2<float> > matcher; 

vector<cv::DMatch> matches;
matcher.match(descriptors1GPU, descriptors2GPU, matches);

Any suggestions?