Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

strategy to build asynchronic subpixel registration analysis

Hi, I am analysing set of images for subpixel image shifts. I have code which essantially loops through:

loop(){

  • read binary image, send it to GpuMat/cuda

//next 2 points are based on dft, mulSpectrums, magnitude (all cuda "Streamable")

  • convolve with smoothing/gradient kernels (cuda)
  • cross-correlate (phase-correlate) with base image (cuda)

// next are locating maximum with subpixel precision

  • find maxLoc (cuda, but value sent to Point.x/Point.y on CPU)
  • copy maxLoc 3x3 neighbours into Mat (CPU)
  • subpixel registration by quadratic fit (CPU)
  • resulting (x,y) pixel shifts are placed in shift maps (CPU) }

All this is computed ~65000 times, it takes about 8 minutes to compute (256x256 base 16 bit B&W images). Cuda card is not even heating up (nvidia-smi shows 6% GPU-Util).

Any suggestions on how to parallelize (the faster the better) this?

strategy to build asynchronic subpixel registration analysis

Hi, I am analysing set of images for subpixel image shifts. I have code which essantially loops through:

loop(){

  • read binary image, send it to GpuMat/cuda

//next 2 points are based on dft, mulSpectrums, magnitude (all cuda "Streamable")

  • convolve with smoothing/gradient kernels (cuda)
  • cross-correlate (phase-correlate) with base image (cuda)

// next are locating maximum with subpixel precision

  • find maxLoc (cuda, but value sent to Point.x/Point.y on CPU)
  • copy maxLoc 3x3 neighbours into Mat (CPU)
  • subpixel registration by quadratic fit (CPU)
  • resulting (x,y) pixel shifts are placed in shift maps (CPU) }

All this is computed ~65000 times, it takes about 8 minutes to compute (256x256 base 16 bit B&W images). Cuda card is not even heating up (nvidia-smi shows 6% GPU-Util).

Any suggestions on how to parallelize (the faster the better) this?this? (also thanks to L.Berger who got me this far)

strategy to build asynchronic subpixel registration analysis

Hi, I am analysing set of images for subpixel image shifts. I have code which essantially loops through:

loop(){

  • read binary image, send it to GpuMat/cuda

//next 2 points are based on dft, mulSpectrums, magnitude (all cuda "Streamable")

  • convolve with smoothing/gradient kernels (cuda)
  • cross-correlate (phase-correlate) with base image (cuda)

// next are locating maximum with subpixel precision

  • find maxLoc (cuda, but value sent to Point.x/Point.y on CPU)
  • copy maxLoc 3x3 neighbours into Mat (CPU)
  • subpixel registration by quadratic fit (CPU)
  • resulting (x,y) pixel shifts are placed in shift maps (CPU) }

All this is computed ~65000 times, it takes about 8 minutes to compute (256x256 base 16 bit B&W images). Cuda card is not even heating up (nvidia-smi shows 6% GPU-Util).

Any suggestions on how to parallelize (the faster the better) this? this?

(also thanks to L.Berger who got me this far)

strategy to build asynchronic subpixel registration analysis

Hi, I am analysing set of images for subpixel image shifts. I have code which essantially loops through:

loop(){

  • read binary image, send it to GpuMat/cuda

//next 2 points are based on dft, mulSpectrums, magnitude (all cuda "Streamable")

  • convolve with smoothing/gradient kernels (cuda)
  • cross-correlate (phase-correlate) with base image (cuda)

// next are locating maximum of correlation pattern with subpixel precision

  • find maxLoc (cuda, but value sent to Point.x/Point.y on CPU)
  • copy maxLoc 3x3 neighbours into Mat (CPU)
  • subpixel registration by quadratic fit (CPU)
  • resulting (x,y) pixel shifts are placed in shift maps (CPU) }

All this is computed ~65000 times, it takes about 8 minutes to compute (256x256 base 16 bit B&W images). Cuda card is not even heating up (nvidia-smi shows 6% GPU-Util).

Any suggestions on how to parallelize (the faster the better) this?

(also thanks to L.Berger who got me this far)

strategy to build asynchronic subpixel registration analysis

Hi, I am analysing set of images for subpixel image shifts. I have code which essantially loops through:

loop(){

  • read binary image, send it to GpuMat/cuda

//next 2 points are based on dft, mulSpectrums, magnitude (all cuda "Streamable")

  • convolve with smoothing/gradient kernels (cuda)
  • cross-correlate (phase-correlate) with base image (cuda)

// next are locating maximum of correlation pattern with subpixel precision

  • find maxLoc (cuda, but value sent to Point.x/Point.y on CPU)
  • copy maxLoc 3x3 neighbours into Mat (CPU)
  • subpixel registration by quadratic fit of 3x3 maxima neighbours (CPU)
  • resulting (x,y) pixel shifts are placed in shift maps (CPU) }

All this is computed ~65000 times, it takes about 8 minutes to compute (256x256 base 16 bit B&W images). Cuda card is not even heating up (nvidia-smi shows 6% GPU-Util).

Any suggestions on how to parallelize (the faster the better) this?

(also thanks to L.Berger who got me this far)