Ask Your Question

How to run OpenCV DNN on NVidia GPU

asked 2018-10-19 03:38:07 -0600

gradlaserb gravatar image


I want to use my Nvidia GTX 1060 GPU when I run with my DNN code. I am using OpenCV. I tried with CPU, However, It is absolutely slow. So, I change this line,




And then, I get an error that is [ WARN:0] DNN: OpenCL target is not supported with current OpenCL device (tested with Intel GPUs only), switching to CPU.

How Can I solve this problem? I have OpenCV 3.4.3.

edit retag flag offensive close merge delete

2 answers

Sort by ยป oldest newest most voted

answered 2018-10-19 04:03:54 -0600

berak gravatar image

updated 2018-10-19 04:26:54 -0600

using 3.4.3, you can't do much about it. situation has improved a bit on master branch, but imho, it's still WIP.

you could try to set the env var: OPENCV_DNN_OPENCL_ALLOW_ALL_DEVICES=1 (maybe that gets you around it, but i'm only guessing)

apart from that, all you can do now is -- try a different model / network / architecture.

if it's a detection one, you can also try to use smaller input windows (e.g. 150x150 instead of 300x300)

edit flag offensive delete link more


Ok, does that mean that Yolov3 (which has been added to OpenCV) cannot use cuDNN for maximum speed? If not, are there plans to add this support?

AlexTheGreat gravatar imageAlexTheGreat ( 2018-10-19 05:00:04 -0600 )edit

@AlexTheGreat , - no idea about cuDNN, but there is no support for CUDA (with opencv's dnn module), and no plan to add such.

berak gravatar imageberak ( 2018-10-19 05:03:50 -0600 )edit

Hm, that's a bit surprising though, because OpenCV used to inform about and already has CUDA support ( ), and using CUDA and cuDNN in the OpenCV DNN implementation would be a natural step forward, or I am missing something?

AlexTheGreat gravatar imageAlexTheGreat ( 2018-10-19 05:41:57 -0600 )edit

(don't look at outdated 2.4, which was frozen 5 years ago !)

berak gravatar imageberak ( 2018-10-19 05:43:41 -0600 )edit

We've merged a PR which lets run networks with OpenCL without extra flags:

dkurt gravatar imagedkurt ( 2018-10-19 10:53:52 -0600 )edit

I see, thanks. But to come back to the original question, because I am still not clear about it. Does that mean that we can somehow accelerate the DNN implementation in OpenCV including YOLO with a GPU (Intel, NVidia)?

AlexTheGreat gravatar imageAlexTheGreat ( 2018-10-20 03:43:05 -0600 )edit

@AlexTheGreat -- try with latest 3.4 or master branch (NOT any releases !) and

berak gravatar imageberak ( 2018-10-20 03:58:32 -0600 )edit

I use OpenCV 4.1.1 on Nvidia Tegra Nano compiled with CUDA support. I compiled Darknet with CUDA and cuDNN support as well. Still, running net.setPreferableTarget(DNN_TARGET_OPENCL); net.forward(...); shows 100% of all CPU core usage, then swap memory occupied, then system frozen.

Update: Nvidia Nano is not support OpenCL :-(

YuriiChernyshov gravatar imageYuriiChernyshov ( 2019-07-21 09:21:28 -0600 )edit

I confirm same behavior as of today (OPENCL + jetson nano)

stiv-yakovenko gravatar imagestiv-yakovenko ( 2019-08-25 16:24:57 -0600 )edit

@stiv-yakovenko you can perform inference on Jetson using

Yashas gravatar imageYashas ( 2019-09-02 02:22:51 -0600 )edit

answered 2019-11-28 07:51:24 -0600

Andrew.K gravatar image

updated 2019-11-28 08:19:49 -0600

Hi! I have even older GPU chip than you mentioned (my gpu is GTX 970) and it works perfectly well for me with OpenCV 4.1.1. I have compiled darknet with CUDA 10.0 and cuDNN 7.4 (for CUDA 10.0) and particulary for darknet compilation, I used OpenCV 3.3 according to this link: [] recommendation. I trained my own "YOLOv3 " model based on yolov3-tiny and used it within the following Python code (you can just use the standard yolo models):

import cv2 as cv
import numpy as np

classFile = "obj.names" #my own class names or just use coco.names
with open(classFile, 'rt') as f:
    classes ='\n').split('\n')
modelConf = 'yolov3-tiny_obj.cfg'   #or just use yolov3.cfg
modelWeights = 'yolov3-tiny_obj_7000.weights' #or just use yolov3.weights
net = cv.dnn.readNetFromDarknet(modelConf, modelWeights)
winName = "YOLOv3 + OpenCV"
cv.namedWindow(winName, cv.WINDOW_NORMAL)
cv.resizeWindow(winName, 1280, 720)

cap = cv.VideoCapture(inputFile)
      _,frame =
     if np.shape(frame) != ():
             blob = cv.dnn.blobFromImage(frame, 1/255, (inpWidth, inpHeight), [0,0,0],1,crop=False)

             outs = net.forward(getOutputsNames(net)) #reading .name file according to extracted objects
             frameExtract(frame, outs)   #standard frame extraction. I skipped to be short.
             cv.imshow(winName, frame)
             k = cv.waitKey(1) & 0xFF
             print("Reinitialize capture device ", time.ctime())
             cap = cv.VideoCapture(inputFile)
             k = cv.waitKey(1) & 0xFF
     if k == 27:
edit flag offensive delete link more


question was about using opencv's dnn module, not how to compile darknet

berak gravatar imageberak ( 2019-11-28 07:53:39 -0600 )edit

It is exactly about CV's DNN. I mentioned darknet only because initially I installed CUDA for it purposes. Sorry for this confusion.

Andrew.K gravatar imageAndrew.K ( 2019-11-28 08:22:20 -0600 )edit

ah, sorry, and thanks for the useful edit ;)

berak gravatar imageberak ( 2019-11-28 08:45:55 -0600 )edit

There is a CUDA backend in OpenCV DNN module now which is much faster than the OpenCL backend.

Yashas gravatar imageYashas ( 2019-12-02 05:58:59 -0600 )edit

@Yashas how do i turn on the Cuda backend? I am also trying to use 1060 TI with opencv 4.1.2

muz gravatar imagemuz ( 2019-12-05 12:25:07 -0600 )edit

@muz Assuming that you have built the master (because the CUDA backend is not yet in a release), you have to set backend to net.setPreferableBackend(DNN_BACKEND_CUDA) and target to net.setPreferableTarget(DNN_TARGET_CUDA) or setPreferableTarget(DNN_TARGET_CUDA_FP16).

Yashas gravatar imageYashas ( 2019-12-07 09:10:20 -0600 )edit

@Yashas when I see all comments here, all of them said that CPU is faster than GPU with SSD, I understood from you that it depends on Hardware, right??. Please cloud tell me which is the best one so that fps will be more than 50 at least. I'd like to use SSD, python, OpenCV. Note: I am using NVIDIA GeForce GTX 1050 Ti (it achieves only around 10 fps with SDD, cuDNN =7.6.5 , CUDA =10 and compute capability 6.1) while CPU is around 30 fps.

redhwan gravatar imageredhwan ( 2020-06-30 21:04:00 -0600 )edit

Question Tools



Asked: 2018-10-19 03:38:07 -0600

Seen: 24,928 times

Last updated: Nov 28 '19