copyTo is oddly slow for (at least for planes of 3D Mats)

I know this isn't really a question though I guess I would like to know why. I think it's important to share and I didn't know where else to post this.

In short, copyTo runs about 7x slower than it should. I know this because I made a simple, code below, equivalent copy in matlab and it was that much faster. It's just a memory copy. This isn't any "fancy" matlab function.

Basic setup: reading a frames from a file and inserting them in a preallocated plane (aka slice) of a 3d matrix. After taking some timings I realized that all the time was being spent in copyTo vs the read. So, then I hacked my code into just a copyTo speed test and then I compared that to the equivalent Matlab test.

This is done on a win7x64. The opencv test was compiled in x64 Release (FYI: 2x faster than Debug)

Note that Matlab's memory is contiguous w/r to its first index whereas openCV it's last so to make the comparison fair, I reversed the order of the indices for my matlab test (non-contiguous memory spacing same for Matlab's first dimension slice and opencv's 3rd dimension slice) . Since the matlab test is so much shorter etc, I will give it first with its times.

1) Matlab test. This ran in about 0.19ms/frame. (i.e. top copy F into A(i,:,:))

F=ones(1,448,48,'uint8');
A = zeros(1000,448,48,'uint8');
tic;
for i=1:1000
A(i,:,:) = F;
end;
T=toc  % = 0.19 sec. Given 1000 frames also == ms/frame


2) Opencv equivalent. This takes about 1.4 ms / frame. Note, copyTo doesn't treat 2d marices and the equivalent 3d slices the same so extra variables are needed to provide a the same 3d slice view to both the 2d source and 3d slice target to make copyTo "happy". The views point to the same matrix data, or portion thereof, as their original variables.

int nReadFrames = (int)cap.get(CV_CAP_PROP_FRAME_COUNT);
int nFrameCols = (int)cap.get(CV_CAP_PROP_FRAME_WIDTH); // 448
int nFrameRows = (int)cap.get(CV_CAP_PROP_FRAME_HEIGHT); // 48

int aslicesizes[3] = {nFrameRows,nFrameCols,1};
Mat MovieFrames3D = Mat::zeros(3,a3DMsizes,CV_8UC1); // target 3d array
Mat FramePlane; // Generic slice that will hava a 3d slice (plane) view of the 2d matrices
Range MovieFrames3DRanges[] = {Range::all(),Range::all(),Range(0,1)}; // first slice range;
Mat MovieFrames3DPlane; // slice of the 3d target array

MovieFrames3DPlane = MovieFrames3D(MovieFrames3DRanges);    // slice ,aka plane ,0
int iFrame;
double t = (double)getTickCount();
FramePlane.copyTo(MovieFrames3DPlane);//copying Frame  Frames3D(:,:,0) for test only
}
t = ((double)getTickCount() - t)/((double)nReadFrames * getTickFrequency());
cout << "Times passed in ms per frame: " << t*1000.0 << endl;


EDIT - since I fist posted this, in order ...

edit retag close merge delete

1

could you explain, why you want/need seperate planes ? it's kinda an unnatural format in the opencv world. also, one would more probably use split() and merge(), and operate on an array of Mats, or a vector, not nessecarily a 3d Mat

( 2015-02-01 15:53:43 -0500 )edit

I have several operations that I wan to perform in the 3rd dimension (e.g. median), but I also want to use the planes as images (i.e. the first 2 dimensions).

I find it bizarre that the concept of a 3d matrix seems to such an anathema to opencv. As a person who uses such all the time, albeit it Matlab, it seems quite unnatural limiting not to have it. Why would software that has multidimensional matrices fall apart after 2D requiring people to pick a pseudo 2d format that is likely fine for one use and bad for another (hence your question)? It is logical that any vector function be available across any dimension of any dimensional matrix. After all, it's just a question of the stride when accessing memory.

Anyway, sorry for the whine/rant. I will look at merge/split. Thanks

( 2015-02-01 20:47:39 -0500 )edit
1

I replaced the copyTo with a simple C loop (nothing generic)

unsigned char * pSource = Frame.data;
unsigned char * pDst = MovieFrames3DPlane.data;
int SourceStep = 1;
int DstStep = MovieFrames3DPlane.step.p[1];
for(int ip=0; ip < Frame.rows * Frame.cols; ip++) {
*pDst = *pSource;
pSource +=SourceStep;
pDst +=DstStep;
}


and got a time just slightly better than matlab's (which is generic).

After thinking about it, I might do "your way", a vector of 2D Mats, because vent fast copying still means a few percent of my time budget and I'll have to do that twice. Unlike sum, I can't do a median (for each pixels across all planes) totally on the fly but I can constrain it via 256 bin histogram, for each pixel, vs sorting each pixel's vector.

( 2015-02-01 22:18:21 -0500 )edit
1
• you can get the mean (per channel) of a multichannel img without seperating anything.
• " After all, it's just a question of the stride" - yes, exactly. you can get single channels using mixchannels without doing a copy
( 2015-02-02 01:23:45 -0500 )edit