16 bytes memory alignment for Mat images to be used with SSE instruction
Hi every one,
I want to use the SSE instructions with Mat images in opencv, but the problem is that I need to perform memory alignment to 16 bytes (I am working with single chanel images).
I have a version which already works but with malloc and memcpy, but my purpose behind all of this is to remove the malloc and memcpy , since I am targeting embedded systems ,
here is my version
read the img_input (Mat)
i_step : image step
i_bpl : image byte per line we need to get after alignment
#ifdef __SSE__
uint8_t *img = (uint8_t*)_mm_malloc(i_bpl*i_height*sizeof(uint8_t),16);
#else
uint8_t *img = (uint8_t*)malloc(i_bpl*i_height*sizeof(uint8_t));
#endif
memset (img,0,i_bpl*i_height*sizeof(uint8_t));
if (i_bpl==i_step) {
memcpy(img,m_img_L.data,i_bpl*i_height*sizeof(uint8_t));
}
else {
for (int32_t v=0; v<i_height; v++) {
memcpy(img+v*i_bpl,img_input.data+v*i_step,i_width*sizeof(uint8_t));
}
}
I want to 1. remove the dynamic allocation 2. do 16 bytes alignment for the image
Thanks in advance
I don't think you could tell cv::imread to return a non-continuous mat with the appropriate padding value.
What you could do is to copy the first mat to another mat with the appropriate padding value (cv::Mat()). Then you could use directly the bitmap address for the SSE processing ?
A question: why targetting embedded systems require you to get rid off the malloc and memcpy ? At some point, these functions have to be used somewhere no ?
if you need per row aligment look around MatAllocator