CUDA DNN initialization when changing in batch size
If I initialize a dnn::Net
with a caffe model and set the CUDA backend as
cv::dnn::Net net = dnn::ClassificationModel(prototextPath, cafemodelPath);
net.setPreferableBackend(dnn::Backend::DNN_BACKEND_CUDA);
net.setPreferableTarget(dnn::Target::DNN_TARGET_CUDA);
and then run inference with a single image img1
dnnImgs.push_back(img1);
dnn::blobFromImages(dnnImgs, blob, ....);
net.setInput(blob);
prob = net->forward();
the inference time is substantial (~190ms) on the first call (I guess because of lazy initialization) and then quick (~6ms) on subsequent invocations.
If I then change the batch size by for example adding a second image img2
(batch size 2)
dnnImgs.push_back(img2);
and run the inference I face the same large inference time (~190ms) on the first invocation again.
I would like to know if there is a way to change the batch size without suffering this large inference time the first time net->forward()
is called after the change is made? Essentially am I doing something wrong here or is this just the way it is.
The reason for the question is that I was previously using caffe with this model and I was able to pass varying sized batches without any noticeable (maybe 10ms) increase in inference time the first time the batch size was changed and no increase in inference time after that.
You haven't done anything wrong. That's just the way it is. Changing the input shape causes reinitialization. The only way I can think of is to have multiple
cv::dnn::Net
instances initialized for different batch sizes (something similar is done by TensorRT). There is some work in progress to reduce the initialization time. Caffe's low initialization time is interesting. I'll have to look into it.