If I initialize a dnn::Net
with a caffe model and set the CUDA backend as
cv::dnn::Net net = dnn::ClassificationModel(prototextPath, cafemodelPath);
net.setPreferableBackend(dnn::Backend::DNN_BACKEND_CUDA);
net.setPreferableTarget(dnn::Target::DNN_TARGET_CUDA);
and then run inference with a single image img1
dnnImgs.push_back(img1);
dnn::blobFromImages(dnnImgs, blob, ....);
net.setInput(blob);
prob = net->forward();
the inference time is substantial (~190ms) on the first call (I guess because of lazy initialization) and then quick (~6ms) on subsequent invocations.
If I then change the batch size by for example adding a second image img2
(batch size 2)
dnnImgs.push_back(img2);
and run the inference I face the same large inference time (~190ms) on the first invocation again.
I would like to know if there is a way to change the batch size without suffering this large inference time the first time net->forward()
is called after the change is made? Essentially am I doing something wrong here or is this just the way it is.
The reason for the question is that I was previously using caffe with this model and I was able to pass varying sized batches without any noticeable (maybe 10ms) increase in inference time the first time the batch size was changed and no increase in inference time after that.