Ask Your Question

Pseudocode for custom GPU computation

asked 2012-09-04 17:41:53 -0600

bharath422 gravatar image

updated 2012-09-05 03:16:18 -0600

Andrey Pavlenko gravatar image


I would like to work on a video stream and do some computation element by element on the frames, the output of which needs to be displayed in realtime. I am just starting with Opencv and am a newbie. In this case, I guess I wont be using one of the in-built GPU functions in opencv, but will be writing my own CUDA kernel. In addition, the output Image will be smaller than the input image due to the nature of my computations. Can I know what would be the pseudocode for this? I have come up with this so far:

#include "cv.h"
#include "highgui.h"
#include "CUDABOF.hpp"
#include "Array.hpp"
#include "LinearMemory.hpp"
#include "MemoryUtils.hpp"

using namespace std;
using namespace cv;
using namespace CUDABOF;

int main(int, char**)
    VideoCapture cap(0);
    if(!cap.isOpened()) return -1;

    Mat custom_output;
        Mat frame;
        cap >> frame;
        cvtColor(frame, custom_output, CV_BGR2GRAY);
//        memory Copy from Host to Device;
//        Call CUDA kernel;
//        memory Copy from Device to Host;

        imshow("custom_output", custom_output);
        if(waitKey(30) >= 0) break;
    return 0;

Can someone help me with the API's to fill up the commented places in my code. Your help is much appreciated!


edit retag flag offensive close merge delete


Do you want to construct Mat from memory data, that you can use in imshow? Or smth else?

Daniil Osokin gravatar imageDaniil Osokin ( 2012-09-05 01:03:06 -0600 )edit

I dont understand what is meant by memory data. But yes, I want to use it in imshow in the end, so that the results of my computations can be viewed real-time.

bharath422 gravatar imagebharath422 ( 2012-09-17 15:07:21 -0600 )edit

1 answer

Sort by » oldest newest most voted

answered 2012-09-05 01:45:20 -0600

Vladislav Vinogradov gravatar image

Use this pattern:

using namespace cv;
using namespace cv::gpu;

void callKernel(const GpuMat& src, GpuMat& dst)
    // you can use
    // src.cols - width
    // src.rows - height
    // src.step - step in bytes between image rows
    // src.ptr<T>() - device pointer to data, T - element type
    func(src.ptr<uchar3>(), src.step, src.cols, src.rows);

int main()
    Mat frame;
    Mat output;
    GpuMat d_frame;
    GpuMat d_output;
        cap >> frame;
        if (frame.empty())

        // memory Copy from Host to Device

        // Call CUDA kernel
        d_output.create(size, type);
        callKernel(d_frame, d_output);

        // memory Copy from Device to Host;

        imshow("output", output);
        if(waitKey(30) >= 0) 
edit flag offensive delete link more


Hi Vladislav, Thanks for the answer. But I still dont understand how to plug in the CUDA kernel. Say, I write the kernel like:

__global__ void func(uchar* srcptr, int step, int cols, int rows) { ....


Now, I have to call it like

func<<< GRID_DIM, BLOCK_DIM... >>>(...)

I dont understand where to call this. The call you have written seems to be func(src.ptr<uchar3>(), src.step, src.cols, src.rows);

which does not look like a CUDA call.

Also, I am working only on a grayscale image. In that case, I believe I wont need the src.step value, and I can pass do:

uchar* p =; func(p, src.cols, src.rows);

Am I right?

Thanks again for your help! Bharath

bharath422 gravatar imagebharath422 ( 2012-09-17 16:49:49 -0600 )edit

You should call call kernel from .cu file:

__global__ void funcKernel(uchar* srcptr, int step, int cols, int rows) {

int rowInd = ...;

int colInd = ...;

uchar* rowPtr = srcptr + rowInd*step;

uchar pixVal = rowPtr[colInd];


void func(uchar* srcptr, int step, int cols, int rows) {

func<<< GRID_DIM, BLOCK_DIM... >>>(...);


In .cpp file call func, not funcKernel.

You should always pass step value, because cuda always pad the allocation to ensure that corresponding pointers in any given row will continue to meet the alignment requirements for coalescing as the address is updated from row to row.

Vladislav Vinogradov gravatar imageVladislav Vinogradov ( 2012-09-18 03:02:22 -0600 )edit

Hi Vladislav, Thanks for the help.

I followed you instructions, and when I try to compile, I get the error:

Bharath-Pattabiramans-MacBook-Pro:funcam bharath650$ make ==== Compiling funcam.o === g++ -c funcam.cpp -I. -I/usr/local/cuda/include -I/Developer/GPU\ Computing/C//common/inc/ -I/usr/local/cuda/include -I/usr/local/include/opencv -I/usr/local/include
funcam.cpp: In function ‘void callKernel(const cv::gpu::GpuMat&)’: funcam.cpp:22: error: invalid use of incomplete type ‘const struct cv::gpu::GpuMat’ /usr/local/include/opencv2/core/core.hpp:99: error: forward declaration of ‘const struct cv::gpu::GpuMat’ funcam.cpp:22: error: ‘uchar3’ was not declared in this scope funcam.cpp:22: error: expected primary-expression before ‘)’ token funcam.cpp:22: error: invalid use of incomplete

bharath422 gravatar imagebharath422 ( 2012-09-19 15:41:02 -0600 )edit

I had the wrong .h files included. I am now able to compile, but the output I get is just noise. Here are my routines:

void callKernel(const GpuMat& src, const GpuMat& dst)


uchar* p =;

uchar* p2 =;

func(p, p2, src.step, src.cols, src.rows); }

bharath422 gravatar imagebharath422 ( 2012-09-19 17:52:13 -0600 )edit

extern "C"

void func(uchar* srcptr, uchar* dstptr, int step, int cols, int rows) {

dim3 grDim (16, 12);

dim3 blDim (40, 40);

funcKernel<<< grDim, blDim >>>(srcptr, dstptr, step, cols, rows);


bharath422 gravatar imagebharath422 ( 2012-09-19 17:53:13 -0600 )edit

__global__ void funcKernel(uchar* srcptr, uchar* dstptr, int step, int cols, int rows) {

int rowInd = blockIdx.y * blockDim.y + threadIdx.y;

int colInd = blockIdx.x * blockDim.x + threadIdx.x;

uchar* rowsrcPtr = srcptr + rowInd*step;

uchar* rowdstPtr = dstptr + rowInd*step;

uchar pixVal = rowsrcPtr[colInd];

// rowdstPtr[colInd] = (pixVal > 100 ? 255 : 0);

rowdstPtr[colInd] = 0; }

bharath422 gravatar imagebharath422 ( 2012-09-19 17:54:41 -0600 )edit

I tried setting all the values of the d_output to 0, but what I get is some image with peppered noise. I also tried to change the input image to monochrome using a threshold, 100, but that gave a similar meaningless output only. I also noticed the step sizes of the source and destination images were different., so I passed in that as well into the kernel and computed the rowdstPtr accordingly, and i was getting something partially looking like what my camera captures, but the image doesnt make sense, its mostly random. However, I tried to equate all the pixels to 0 in the cuda code, so wonder how the output GpuMat gets the camera image?

It's all kind of connfusing at this point. If you could give me some more hints that would help me fix the problem and get this code running, it'll be gr8

bharath422 gravatar imagebharath422 ( 2012-09-19 18:02:37 -0600 )edit

when i say "I tried setting all the values of the d_output to 0", I mean the line in the cuda kernel:

rowdstPtr[colInd] = 0;

just wanted to make sure what i said wasnt confusing.

bharath422 gravatar imagebharath422 ( 2012-09-19 18:08:39 -0600 )edit

Could you provide a full code with CPU part (reading frame from camera, memory transfers between CPU and GPU, imshow part)?

Vladislav Vinogradov gravatar imageVladislav Vinogradov ( 2012-09-20 01:18:49 -0600 )edit

Hi Vladislav, I put my code in a new post "GPU Code Not Working Question". Thanks for your help so far!

bharath422 gravatar imagebharath422 ( 2012-09-20 11:53:30 -0600 )edit

Question Tools

1 follower


Asked: 2012-09-04 17:41:53 -0600

Seen: 1,706 times

Last updated: Sep 05 '12