Revision history [back]

CUDA DNN initialization when changing in batch size

If I initialize a dnn::Net with a caffe model and set the CUDA backend as

cv::dnn::Net net = dnn::ClassificationModel(prototextPath, cafemodelPath);
net.setPreferableBackend(dnn::Backend::DNN_BACKEND_CUDA);
net.setPreferableTarget(dnn::Target::DNN_TARGET_CUDA);

and then run inference with a single image img1

dnnImgs.push_back(img1);
dnn::blobFromImages(dnnImgs, blob, ....);
net.setInput(blob);
prob = net->forward();

the inference time is substantial (~190ms) on the first call (I guess because of lazy initialization) and then quick (~6ms) on subsequent invocations. If I then change the batch size by for example adding a second image img2 (batch size 2) dnnImgs.push_back(img2); and run the inference I face the same large inference time (~190ms) on the first invocation again.

I would like to know if there is a way to change the batch size without suffering this large inference time the first time net->forward() is called after the change is made? Essentially am I doing something wrong here or is this just the way it is.

The reason for the question is that I was previously using caffe with this model and I was able to pass varying sized batches without any noticeable (maybe 10ms) increase in inference time the first time the batch size was changed and no increase in inference time after that.