profile - recent activity - OpenCV Q&A Forum

2013-09-04 08:05:02 -0600

asked a question

cv::gpu::sepFilter2D maximum kernel size

Hi,

I just came across the hard coded maximum kernel size of 32 running cv::gpu::sepFilter2D

OpenCV Error: Assertion failed (ksize > 0 && ksize <= 32) in getLinearRowFilter_GPU

Is there a reason for that? I tried using two convolutions instead, but that seems to be much slower (I guess because twice dft is more expensive than a seperated filter). What's the appropriate way to work around that?

Cheers, Andreas

2013-07-09 11:14:26 -0600

asked a question

OpenCV 2.4.6 with Qt4 instead of Qt5

Hi,

I was wondering if Qt5 is obligatory with OpenCV 2.4.6 ? I ran ccmake -DWITH_QT=4 .. but that doesn't seem to work.

Cheers, Andreas

2013-01-24 09:17:01 -0600

asked a question

all CUDA-capable devices are busy or unavailable

Hi all, I'm getting the error

init done 
opengl support available 
OpenCV Error: Gpu API call (all CUDA-capable devices are busy or unavailable) in mallocPitch, file /home/andreas/Downloads/OpenCV-2.4.3/modules/core/src/gpumat.cpp, line 1283
/home/andreas/Downloads/OpenCV-2.4.3/modules/core/src/gpumat.cpp:1283: error: (-217) all CUDA-capable devices are busy or unavailable in function mallocPitch

when I'm trying to run the highgui_gpu_gpu example (with all except the gpumat openGlGpuMatWnd window commented out). So it basically looks like:

 if (haveCuda)
   namedWindow(openGlGpuMatWnd, WINDOW_OPENGL | WINDOW_AUTOSIZE);

 Mat img = imread(argv[1]);

 if (haveCuda)
   setGlDevice(0);

 GpuMat d_img;
 if (haveCuda)
   d_img.upload(img);

 if (haveCuda)
        {
            Timer t("OpenGL GpuMat   ");
            imshow(openGlGpuMatWnd, d_img);
        }

I'm using CUDA 5.0, the cuda examples as well as the opencv+cuda examples work.

I built OpenCV with following CMake Flags:

 WITH_1394                        ON                                                                                                                                                                        
 WITH_CUBLAS                      ON                                                                                                                                                                        
 WITH_CUDA                        ON                                                                                                                                                                        
 WITH_CUFFT                       ON                                                                                                                                                                        
 WITH_EIGEN                       ON                                                                                                                                                                        
 WITH_FFMPEG                      ON                                                                                                                                                                        
 WITH_GIGEAPI                     ON                                                                                                                                                                        
 WITH_GSTREAMER                   ON                                                                                                                                                                        
 WITH_GTK                         ON                                                                                                                                                                        
 WITH_IPP                         ON                                                                                                                                                                        
 WITH_JASPER                      ON                                                                                                                                                                        
 WITH_JPEG                        ON                                                                                                                                                                        
 WITH_OPENCL                      OFF                                                                                                                                                                       
 WITH_OPENCLAMDBLAS               OFF                                                                                                                                                                       
 WITH_OPENCLAMDFFT                OFF                                                                                                                                                                       
 WITH_OPENEXR                     ON                                                                                                                                                                        
 WITH_OPENGL                      ON                                                                                                                                                                        
 WITH_OPENNI                      ON                                                                                                                                                                        
 WITH_PNG                         ON                                                                                                                                                                        
 WITH_PVAPI                       ON                                                                                                                                                                        
 WITH_QT                          ON                                                                                                                                                                        
 WITH_TBB                         ON                                                                                                                                                                        
 WITH_TIFF                        ON                                                                                                                                                                        
 WITH_UNICAP                      OFF                                                                                                                                                                       
 WITH_V4L                         ON                                                                                                                                                                        
 WITH_XIMEA                       OFF                                                                                                                                                                       
 WITH_XINE                        OFF

2012-11-30 05:34:25 -0600

asked a question

Wrong GpuMat matrix elements filled by cuda kernel

Hi all,

my problem is, that I create a GpuMat, then call a cuda kernel with the GpuMats pointer etc, fill the elements of the matrix (called gpumatdiffsqr), but when I'm back on the CPU, the Matrix elements are wrong.

My cpp file

cv::gpu::GpuMat gpumatdiffsqr(gpumatconcat.size(), CV_32FC1, 100);

simple3cpp(gpumato.ptr<uchar>(), gpumato.step, gpumato.cols, gpumato.rows,
    gpumatconcat.ptr<uchar>(), gpumatconcat.step, gpumatconcat.cols, gpumatconcat.rows,
    gpumatdiffsqr.ptr<float>(), gpumatdiffsqr.step, gpumatdiffsqr.elemSize());

cv::Mat tmp;
gpumatdiffsqr.download(tmp);
std::cout << tmp << std::endl;

My cu file:

__global__ void simple3(unsigned char* data, size_t step, const int cols, const int rows,
    unsigned char* data2, size_t step2, const int cols2, const int rows2,
    float* diffsqr_matrix, size_t diffaqr_step, size_t diffsqr_elemSize){
  //thread.x = row thread.y = col

  //calculate difference and square of patch "data" to all blocks in "data2"
  float diff = data[(threadIdx.x*step)+(threadIdx.y*sizeof(unsigned char))] - data2[(threadIdx.x*step)+((blockIdx.x*cols*sizeof(unsigned char))+(threadIdx.y*sizeof(unsigned char)))];
  float diffsqr = diff * diff;

  diffsqr_matrix[(threadIdx.x*diffaqr_step)+((blockIdx.x*cols*sizeof(float))+(threadIdx.y*sizeof(float)))] = (float) diffsqr;
  float test =   diffsqr_matrix[(threadIdx.x*diffaqr_step)+((blockIdx.x*cols*diffsqr_elemSize)+(threadIdx.y*diffsqr_elemSize))];
  __syncthreads();

  printf("%d %d %d: %f %f %f\n", blockIdx.x, threadIdx.x, threadIdx.y, diff, diffsqr, test);
}

The input is:

gpumatconcat:

[0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3;
  4, 5, 6, 7, 4, 5, 6, 7, 4, 5, 6, 7, 4, 5, 6, 7;
  8, 9, 10, 11, 8, 9, 10, 11, 8, 9, 10, 11, 8, 9, 10, 11;
  12, 13, 14, 15, 12, 13, 14, 15, 12, 13, 14, 15, 12, 13, 14, 15]

gpumato:

[20, 20, 20, 20;
  20, 20, 20, 20;
  20, 20, 20, 20;
  20, 20, 20, 20]

The calculation is similar to gpumatdiffsqr = gpumato - gpumatconcat; (gpumato is applied to blocks in gpumatconcat) the output of the printf inside the kernel is:

3 0 0: 20.000000 400.000000 400.000000
3 1 0: 16.000000 256.000000 256.000000
3 2 0: 12.000000 144.000000 144.000000
3 3 0: 8.000000 64.000000 64.000000
3 0 1: 19.000000 361.000000 361.000000
3 1 1: 15.000000 225.000000 225.000000
3 2 1: 11.000000 121.000000 121.000000
3 3 1: 7.000000 49.000000 49.000000

...

so that works fine. However the output of

gpumatdiffsqr.download(tmp);
std::cout << tmp << std::endl;

is something like:

[400, 1.4751525e-39, 1.4751525e-39, 1.4755323e-39, 361, 1.9176691e-38, 1.9176691e-38, 1.917668e-38, 324, 3.4969683e-39, 1.4751525e-39, 1.4766281e-39, 289, 1.9174466e-38, 6.8062748e-39, 1.4751525e-39; ...

I can't figure out my error. Pointer and pointer steps of gpumatdifsqr should be fine.

2012-11-05 15:19:52 -0600

asked a question

cv::gpu::norm speed up

Hi all,

I use the function cv::gpu::norm in a program. This function gets called a lot. With CPU i get about 5 Hz, with GPU it's not usable (a couple of seconds). I suppose the problem is, that the matrices are very small (4x4 - 16x16), so that I can't really make use of the GPU's performance.

Just some background information: I use the norm function to calculate the radial basis function:

double calculateRBFresponse(boost::shared_ptr<cv::gpu::GpuMat> input, boost::shared_ptr<cv::gpu::GpuMat> neuron, double beta){

      double response = cv::gpu::norm(*input, *neuron, cv::NORM_L2);
      return cv::exp( -beta * cv::pow ( response, 2.0 ));
}

Is there any way to speed this up? Or is it - as i suppose - the wrong task for a GPU? Or is there maybe a way to parallelize the task, so that multiple norm calls are run in parallel?

Cheers, Andreas

2012-10-22 16:46:54 -0600

asked a question

GpuMat and std::vector

Hi all,

what is the appropriate way to store GpuMats in a container. So far I'm using std::vector<cv::gpu::GpuMat> But I'm wondering how the GpuMats are stored in the vector. Does the vector create copies of the matrices on the GPU memory? If so, it's probably not very efficient. What would be the best solution? Would it make more sense to use std::vector<cv::gpu::GpuMat*> or better std::vector < boost::shared_ptr <cv::gpu::GpuMat> > ?

Cheers, Andreas

2021-03-08 21:55:23 -0600	received badge	● Popular Question (source)
2019-02-08 07:04:58 -0600	received badge	● Notable Question (source)
2019-02-08 07:04:58 -0600	received badge	● Popular Question (source)
2013-09-04 08:05:02 -0600	asked a question	cv::gpu::sepFilter2D maximum kernel size Hi, I just came across the hard coded maximum kernel size of 32 running cv::gpu::sepFilter2D OpenCV Error: Assertion failed (ksize > 0 && ksize <= 32) in getLinearRowFilter_GPU Is there a reason for that? I tried using two convolutions instead, but that seems to be much slower (I guess because twice dft is more expensive than a seperated filter). What's the appropriate way to work around that? Cheers, Andreas
2013-07-10 05:56:56 -0600	commented question	OpenCV 2.4.6 with Qt4 instead of Qt5 Thanks! I'll try that.
2013-07-09 17:09:18 -0600	received badge	● Nice Question (source)
2013-07-09 11:14:26 -0600	asked a question	OpenCV 2.4.6 with Qt4 instead of Qt5 Hi, I was wondering if Qt5 is obligatory with OpenCV 2.4.6 ? I ran ccmake -DWITH_QT=4 .. but that doesn't seem to work. Cheers, Andreas
2013-07-05 17:02:07 -0600	received badge	● Nice Question (source)
2013-01-25 05:22:52 -0600	commented question	all CUDA-capable devices are busy or unavailable btw, if I try to run the unmodified highgui_gpu_gpu example, I also get an error: OpenCV Error: Assertion failed (buffer_ != 0) in Impl, file /home/andreas/Downloads/OpenCV-2.4.2/modules/core/src/opengl_interop.cpp, line 371 /home/andreas/Downloads/OpenCV-2.4.2/modules/core/src/opengl_interop.cpp:371: error: (-215) buffer_ != 0 in function Impl
2013-01-24 10:59:30 -0600	commented question	all CUDA-capable devices are busy or unavailable Yes, I've already read that post, and any other I found on that topic, without any luck. Maybe it has something to do with Ubuntu running compiz. But that's just a guess.
2013-01-24 09:17:01 -0600	asked a question	all CUDA-capable devices are busy or unavailable Hi all, I'm getting the error `init done opengl support available OpenCV Error: Gpu API call (all CUDA-capable devices are busy or unavailable) in mallocPitch, file /home/andreas/Downloads/OpenCV-2.4.3/modules/core/src/gpumat.cpp, line 1283 /home/andreas/Downloads/OpenCV-2.4.3/modules/core/src/gpumat.cpp:1283: error: (-217) all CUDA-capable devices are busy or unavailable in function mallocPitch` when I'm trying to run the highgui_gpu_gpu example (with all except the gpumat openGlGpuMatWnd window commented out). So it basically looks like: `if (haveCuda) namedWindow(openGlGpuMatWnd, WINDOW_OPENGL \| WINDOW_AUTOSIZE); Mat img = imread(argv[1]); if (haveCuda) setGlDevice(0); GpuMat d_img; if (haveCuda) d_img.upload(img); if (haveCuda) { Timer t("OpenGL GpuMat "); imshow(openGlGpuMatWnd, d_img); }` I'm using CUDA 5.0, the cuda examples as well as the opencv+cuda examples work. I built OpenCV with following CMake Flags: `WITH_1394 ON WITH_CUBLAS ON WITH_CUDA ON WITH_CUFFT ON WITH_EIGEN ON WITH_FFMPEG ON WITH_GIGEAPI ON WITH_GSTREAMER ON WITH_GTK ON WITH_IPP ON WITH_JASPER ON WITH_JPEG ON WITH_OPENCL OFF WITH_OPENCLAMDBLAS OFF WITH_OPENCLAMDFFT OFF WITH_OPENEXR ON WITH_OPENGL ON WITH_OPENNI ON WITH_PNG ON WITH_PVAPI ON WITH_QT ON WITH_TBB ON WITH_TIFF ON WITH_UNICAP OFF WITH_V4L ON WITH_XIMEA OFF WITH_XINE OFF`
2012-11-30 06:36:50 -0600	commented answer	Wrong GpuMat matrix elements filled by cuda kernel Great, thanks!
2012-11-30 05:34:25 -0600	asked a question	Wrong GpuMat matrix elements filled by cuda kernel Hi all, my problem is, that I create a GpuMat, then call a cuda kernel with the GpuMats pointer etc, fill the elements of the matrix (called gpumatdiffsqr), but when I'm back on the CPU, the Matrix elements are wrong. My cpp file `cv::gpu::GpuMat gpumatdiffsqr(gpumatconcat.size(), CV_32FC1, 100); simple3cpp(gpumato.ptr<uchar>(), gpumato.step, gpumato.cols, gpumato.rows, gpumatconcat.ptr<uchar>(), gpumatconcat.step, gpumatconcat.cols, gpumatconcat.rows, gpumatdiffsqr.ptr<float>(), gpumatdiffsqr.step, gpumatdiffsqr.elemSize()); cv::Mat tmp; gpumatdiffsqr.download(tmp); std::cout << tmp << std::endl;` My cu file: __global__ void simple3(unsigned char* data, size_t step, const int cols, const int rows, unsigned char* data2, size_t step2, const int cols2, const int rows2, float* diffsqr_matrix, size_t diffaqr_step, size_t diffsqr_elemSize){ //thread.x = row thread.y = col //calculate difference and square of patch "data" to all blocks in "data2" float diff = data[(threadIdx.xstep)+(threadIdx.ysizeof(unsigned char))] - data2[(threadIdx.xstep)+((blockIdx.xcolssizeof(unsigned char))+(threadIdx.ysizeof(unsigned char)))]; float diffsqr = diff * diff; diffsqr_matrix[(threadIdx.xdiffaqr_step)+((blockIdx.xcolssizeof(float))+(threadIdx.ysizeof(float)))] = (float) diffsqr; float test = diffsqr_matrix[(threadIdx.xdiffaqr_step)+((blockIdx.xcolsdiffsqr_elemSize)+(threadIdx.ydiffsqr_elemSize))]; __syncthreads(); printf("%d %d %d: %f %f %f\n", blockIdx.x, threadIdx.x, threadIdx.y, diff, diffsqr, test); } The input is: gpumatconcat: `[0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3; 4, 5, 6, 7, 4, 5, 6, 7, 4, 5, 6, 7, 4, 5, 6, 7; 8, 9, 10, 11, 8, 9, 10, 11, 8, 9, 10, 11, 8, 9, 10, 11; 12, 13, 14, 15, 12, 13, 14, 15, 12, 13, 14, 15, 12, 13, 14, 15]` gpumato: `[20, 20, 20, 20; 20, 20, 20, 20; 20, 20, 20, 20; 20, 20, 20, 20]` The calculation is similar to gpumatdiffsqr = gpumato - gpumatconcat; (gpumato is applied to blocks in gpumatconcat) the output of the printf inside the kernel is: `3 0 0: 20.000000 400.000000 400.000000 3 1 0: 16.000000 256.000000 256.000000 3 2 0: 12.000000 144.000000 144.000000 3 3 0: 8.000000 64.000000 64.000000 3 0 1: 19.000000 361.000000 361.000000 3 1 1: 15.000000 225.000000 225.000000 3 2 1: 11.000000 121.000000 121.000000 3 3 1: 7.000000 49.000000 49.000000` ... so that works fine. However the output of `gpumatdiffsqr.download(tmp); std::cout << tmp << std::endl;` is something like: [400, 1.4751525e-39, 1.4751525e-39, 1.4755323e-39, 361, 1.9176691e-38, 1.9176691e-38, 1.917668e-38, 324, 3.4969683e-39, 1.4751525e-39, 1.4766281e-39, 289, 1.9174466e-38, 6.8062748e-39, 1.4751525e-39; ... I can't figure out my error. Pointer and pointer steps of gpumatdifsqr should be fine.
2012-11-05 15:19:52 -0600	asked a question	cv::gpu::norm speed up Hi all, I use the function cv::gpu::norm in a program. This function gets called a lot. With CPU i get about 5 Hz, with GPU it's not usable (a couple of seconds). I suppose the problem is, that the matrices are very small (4x4 - 16x16), so that I can't really make use of the GPU's performance. Just some background information: I use the norm function to calculate the radial basis function: `double calculateRBFresponse(boost::shared_ptr<cv::gpu::GpuMat> input, boost::shared_ptr<cv::gpu::GpuMat> neuron, double beta){ double response = cv::gpu::norm(input, neuron, cv::NORM_L2); return cv::exp( -beta * cv::pow ( response, 2.0 )); }` Is there any way to speed this up? Or is it - as i suppose - the wrong task for a GPU? Or is there maybe a way to parallelize the task, so that multiple norm calls are run in parallel? Cheers, Andreas
2012-10-24 02:34:29 -0600	received badge	● Student (source)
2012-10-23 08:55:29 -0600	commented answer	GpuMat and std::vector Thanks! Good to know :)
2012-10-23 08:55:07 -0600	received badge	● Supporter (source)
2012-10-23 08:55:05 -0600	received badge	● Scholar (source)
2012-10-22 16:48:29 -0600	received badge	● Editor (source)
2012-10-22 16:46:54 -0600	asked a question	GpuMat and std::vector Hi all, what is the appropriate way to store GpuMats in a container. So far I'm using `std::vector<cv::gpu::GpuMat>` But I'm wondering how the GpuMats are stored in the vector. Does the vector create copies of the matrices on the GPU memory? If so, it's probably not very efficient. What would be the best solution? Would it make more sense to use `std::vector<cv::gpu::GpuMat*>` or better `std::vector < boost::shared_ptr <cv::gpu::GpuMat> >` ? Cheers, Andreas

andreas_'s profile - activity