Ask Your Question

Revision history [back]

16 bytes memory alignment for Mat images to be used with SSE instruction

Hi every one,

I want to use the SSE instructions with Mat images in opencv, but the problem is that I need to perform memory alignment to 16 bytes (I am working with single chanel images).

I have a version which already works but with malloc and memcpy, but my purpose behind all of this is to remove the malloc and memcpy , since I am targeting embedded systems ,

here is my version

read the img_input (Mat) i_step : image step i_bpl : image byte per line we need to get after alignment #ifdef __SSE__ uint8_t img = (uint8_t)_mm_malloc(i_bpli_heightsizeof(uint8_t),16); #else uint8_t img = (uint8_t)malloc(i_bpli_heightsizeof(uint8_t)); #endif memset (img,0,i_bpli_heightsizeof(uint8_t)); if (i_bpl==i_step) { memcpy(img,m_img_L.data,i_bpli_heightsizeof(uint8_t)); } else { for (int32_t v=0; v<i_height; v++)="" {="" memcpy(img+v<em="">i_bpl,img_input.data+vi_step,i_width*sizeof(uint8_t)); } }

I want to 1. remove the dynamic allocation 2. do 16 bytes alignment for the image

Thanks in advance

click to hide/show revision 2
No.2 Revision

16 bytes memory alignment for Mat images to be used with SSE instruction

Hi every one,

I want to use the SSE instructions with Mat images in opencv, but the problem is that I need to perform memory alignment to 16 bytes (I am working with single chanel images).

I have a version which already works but with malloc and memcpy, but my purpose behind all of this is to remove the malloc and memcpy , since I am targeting embedded systems ,

here is my version

read the img_input (Mat)
i_step : image step
i_bpl : image byte per line we need to get after alignment
#ifdef __SSE__
uint8_t img *img = (uint8_t)_mm_malloc(i_bpli_heightsizeof(uint8_t),16);
(uint8_t*)_mm_malloc(i_bpl*i_height*sizeof(uint8_t),16);
 #else
uint8_t img *img = (uint8_t)malloc(i_bpli_heightsizeof(uint8_t));
(uint8_t*)malloc(i_bpl*i_height*sizeof(uint8_t));
 #endif
memset (img,0,i_bpli_heightsizeof(uint8_t)); (img,0,i_bpl*i_height*sizeof(uint8_t)); 
if (i_bpl==i_step) {
memcpy(img,m_img_L.data,i_bpli_heightsizeof(uint8_t));
memcpy(img,m_img_L.data,i_bpl*i_height*sizeof(uint8_t));
 }
else {
  for (int32_t v=0; v<i_height; v++)="" {="" memcpy(img+v<em="">i_bpl,img_input.data+vi_step,i_width*sizeof(uint8_t));
v++) {
memcpy(img+v*i_bpl,img_input.data+v*i_step,i_width*sizeof(uint8_t));
 }
} 

}

I want to 1. remove the dynamic allocation 2. do 16 bytes alignment for the image

Thanks in advance