Ask Your Question
0

Is this code correct to allocate two aligned cv::Mat?

asked 2017-05-01 15:10:19 -0600

lovaj gravatar image

Disclamer: I am a SIMD newbie, but I have a good knowledge of OpenCV.

Let's suppose I have basic OpenCV code.

cv::Mat1f a = cv::Mat(n, n);
cv::Mat1f b = cv::Mat(n, n);
cv::Mat1f x;
//fill a and b somehow
x = a+b;

Now, let's suppose I want to use the code above on AVX2 or even AVX-512 machine. Having the data 32-aligned could be a great benefit for performances. The data above should not be aligned. I'm almost certain of it because in the optimization report files generated by the Intel Compiler it's said that the data is unaligned, but I could be wrong.

So what if I allocate 32-aligned pointers and use them for the operations above? Something like:

float *apt = __mm_alloc(sizeof(float)*m*n, 32);
float *bpt = __mm_alloc(sizeof(float)*m*n, 32);
float *xpt = __mm_alloc(sizeof(float)*m*n, 32);
cv::Mat1f a = cv::Mat(n, m, apt);
cv::Mat1f b = cv::Mat(m, n, bpt);
cv::Mat1f x = cv::Mat(n, n, xpt);
x = a+b;

I see three problems that concerns me in the code above:

  1. Matrix multiplications in cv::Mat returns a new cv::Mat object which, unfortunately, would be not aligned.
  2. Am I reinventing the wheel? I don't know if OpenCV includes already this kind of optimizations, even though I didn't find nothing useful on Google.
  3. Reading a little bit on Intel Intrinsics, when we use __mm_malloc, we should use __mm_free. However, cv::Mat are "kinda" of smart pointers, so they're allocated and de-allocated behind-the-hood, where probably some free or delete function is called.

To overcome problem 1, in very critical sections of code where data alignment could improve the performance, I could get the pointer back and do the sum operation using pointers. Something like:

//do the code above
float *aptr = a.ptr<float>(0);
float *bptr = b.ptr<float>(0);
float *xptr = c.ptr<float>(0);
for(int i=0; i<n ; i++)
  for(int j=0; j<m; j++)
    xptr[i*m+j] = a[i*m+j] + b[i*m+j];

But again, I'm afraid that I'm reinventing the wheel. Notice that this is a toy example to make it MCV, my actual code is much more complicated than this and compiler optimizations may not be obvious.

I compile my code with icpc and the following flags:

INTEL_OPT=-O3 -ipo -simd -xCORE-AVX2 -parallel -qopenmp -fargument-noalias -ansi-alias -no-prec-div -fp-model fast=2 -fma -align -finline-functions
INTEL_PROFILE=-g -qopt-report=5 -Bdynamic -shared-intel -debug inline-debug-info -qopenmp-link dynamic -parallel-source-info=2 -ldl
edit retag flag offensive close merge delete

1 answer

Sort by ยป oldest newest most voted
0

answered 2017-05-01 16:07:56 -0600

matman gravatar image

cv::Mat data is already aligned. See here and here. CV_MALLOC_ALIGN should be 32 in case you build with AVX otherwise it is 16.

edit flag offensive delete link more

Comments

@matman speaking of which, how do I build OpenCV with AVX support?

lovaj gravatar imagelovaj ( 2017-05-01 16:37:28 -0600 )edit

@matman if you would give a look at this question I would really appreciate it

lovaj gravatar imagelovaj ( 2017-05-01 16:37:55 -0600 )edit

You have to specify it in CMake. Easiest way to do this is to use CMake GUI I think.

matman gravatar imagematman ( 2017-05-01 17:08:13 -0600 )edit

Question Tools

1 follower

Stats

Asked: 2017-05-01 15:10:19 -0600

Seen: 1,182 times

Last updated: May 01 '17