Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

GpuMat's step is always in bytes, so you should access diffsqr_matrix elements in this way:

float* diffsqr_row = (float*)((char*)diffsqr_matrix + threadIdx.x * diffaqr_step);
diffsqr_row[blockIdx.x*cols + threadIdx.y] = (float) diffsqr;

Also I recommend you to swap threadIdx.x and threadIdx.y usage (threadIdx.y - row, threadIdx.x - col). This give you coalesced memory access.