How to run OpenCV DNN on NVidia GPU

asked 2018-10-19 03:38:07 -0600

gradlaserb
11 ●1 ●1 ●2

Hi,

I want to use my Nvidia GTX 1060 GPU when I run with my DNN code. I am using OpenCV. I tried with CPU, However, It is absolutely slow. So, I change this line,

net.setPreferableTarget(DNN_TARGET_CPU);

to,

net.setPreferableTarget(DNN_TARGET_OPENCL);

And then, I get an error that is [ WARN:0] DNN: OpenCL target is not supported with current OpenCL device (tested with Intel GPUs only), switching to CPU.

How Can I solve this problem? I have OpenCV 3.4.3.

edit retag flag offensive close merge delete

add a comment

2 answers

Sort by » oldest newest most voted

answered 2018-10-19 04:03:54 -0600

berak
32993 ●7 ●81 ●312

updated 2018-10-19 04:26:54 -0600

using 3.4.3, you can't do much about it. situation has improved a bit on master branch, but imho, it's still WIP.

you could try to set the env var: OPENCV_DNN_OPENCL_ALLOW_ALL_DEVICES=1 (maybe that gets you around it, but i'm only guessing)

apart from that, all you can do now is -- try a different model / network / architecture.

if it's a detection one, you can also try to use smaller input windows (e.g. 150x150 instead of 300x300)

edit flag offensive delete link

Comments

Ok, does that mean that Yolov3 (which has been added to OpenCV) cannot use cuDNN for maximum speed? If not, are there plans to add this support?

AlexTheGreat ( 2018-10-19 05:00:04 -0600 )edit

@AlexTheGreat , - no idea about cuDNN, but there is no support for CUDA (with opencv's dnn module), and no plan to add such.

berak ( 2018-10-19 05:03:50 -0600 )edit

Hm, that's a bit surprising though, because OpenCV used to inform about and already has CUDA support ( https://docs.opencv.org/2.4/modules/g... ), and using CUDA and cuDNN in the OpenCV DNN implementation would be a natural step forward, or I am missing something?

AlexTheGreat ( 2018-10-19 05:41:57 -0600 )edit

(don't look at outdated 2.4, which was frozen 5 years ago !)

berak ( 2018-10-19 05:43:41 -0600 )edit

We've merged a PR which lets run networks with OpenCL without extra flags: https://github.com/opencv/opencv/pull...

dkurt ( 2018-10-19 10:53:52 -0600 )edit

I see, thanks. But to come back to the original question, because I am still not clear about it. Does that mean that we can somehow accelerate the DNN implementation in OpenCV including YOLO with a GPU (Intel, NVidia)?

AlexTheGreat ( 2018-10-20 03:43:05 -0600 )edit

@AlexTheGreat -- try with latest 3.4 or master branch (NOT any releases !) and

 net.setPreferableTarget(DNN_TARGET_OPENCL);

berak ( 2018-10-20 03:58:32 -0600 )edit

I use OpenCV 4.1.1 on Nvidia Tegra Nano compiled with CUDA support. I compiled Darknet with CUDA and cuDNN support as well. Still, running net.setPreferableTarget(DNN_TARGET_OPENCL); net.forward(...); shows 100% of all CPU core usage, then swap memory occupied, then system frozen.

Update: Nvidia Nano is not support OpenCL :-( https://devtalk.nvidia.com/default/to...

YuriiChernyshov ( 2019-07-21 09:21:28 -0600 )edit

I confirm same behavior as of today (OPENCL + jetson nano)

stiv-yakovenko ( 2019-08-25 16:24:57 -0600 )edit

@stiv-yakovenko you can perform inference on Jetson using https://github.com/opencv/opencv/pull...

Yashas ( 2019-09-02 02:22:51 -0600 )edit

add a comment

answered 2019-11-28 07:51:24 -0600

Andrew.K
1 ●1

updated 2019-11-28 08:19:49 -0600

Hi! I have even older GPU chip than you mentioned (my gpu is GTX 970) and it works perfectly well for me with OpenCV 4.1.1. I have compiled darknet with CUDA 10.0 and cuDNN 7.4 (for CUDA 10.0) and particulary for darknet compilation, I used OpenCV 3.3 according to this link: [https://github.com/AlexeyAB/darknet] recommendation. I trained my own "YOLOv3 " model based on yolov3-tiny and used it within the following Python code (you can just use the standard yolo models):

import cv2 as cv
import numpy as np

classFile = "obj.names" #my own class names or just use coco.names
with open(classFile, 'rt') as f:
    classes = f.read().rstrip('\n').split('\n')
modelConf = 'yolov3-tiny_obj.cfg'   #or just use yolov3.cfg
modelWeights = 'yolov3-tiny_obj_7000.weights' #or just use yolov3.weights
net = cv.dnn.readNetFromDarknet(modelConf, modelWeights)
net.setPreferableBackend(cv.dnn.DNN_BACKEND_OPENCV)
net.setPreferableTarget(cv.dnn.DNN_TARGET_OPENCL_FP16)
winName = "YOLOv3 + OpenCV"
cv.namedWindow(winName, cv.WINDOW_NORMAL)
cv.resizeWindow(winName, 1280, 720)

cap = cv.VideoCapture(inputFile)
while(True):
      _,frame = cap.read()
     if np.shape(frame) != ():
             blob = cv.dnn.blobFromImage(frame, 1/255, (inpWidth, inpHeight), [0,0,0],1,crop=False)

             net.setInput(blob)
             outs = net.forward(getOutputsNames(net)) #reading .name file according to extracted objects
             frameExtract(frame, outs)   #standard frame extraction. I skipped to be short.
             cv.imshow(winName, frame)
             k = cv.waitKey(1) & 0xFF
     else:
             print("Reinitialize capture device ", time.ctime())
             cap = cv.VideoCapture(inputFile)
             time.sleep(1)
             k = cv.waitKey(1) & 0xFF
     if k == 27:
            cv.destroyAllWindows()
            break

edit flag offensive delete link

Comments

question was about using opencv's dnn module, not how to compile darknet

berak ( 2019-11-28 07:53:39 -0600 )edit

It is exactly about CV's DNN. I mentioned darknet only because initially I installed CUDA for it purposes. Sorry for this confusion.

Andrew.K ( 2019-11-28 08:22:20 -0600 )edit

ah, sorry, and thanks for the useful edit ;)

berak ( 2019-11-28 08:45:55 -0600 )edit

There is a CUDA backend in OpenCV DNN module now which is much faster than the OpenCL backend.

Yashas ( 2019-12-02 05:58:59 -0600 )edit

@Yashas how do i turn on the Cuda backend? I am also trying to use 1060 TI with opencv 4.1.2

muz ( 2019-12-05 12:25:07 -0600 )edit

@muz Assuming that you have built the master (because the CUDA backend is not yet in a release), you have to set backend to net.setPreferableBackend(DNN_BACKEND_CUDA) and target to net.setPreferableTarget(DNN_TARGET_CUDA) or setPreferableTarget(DNN_TARGET_CUDA_FP16).

Yashas ( 2019-12-07 09:10:20 -0600 )edit

@Yashas when I see all comments here, all of them said that CPU is faster than GPU with SSD, I understood from you that it depends on Hardware, right??. Please cloud tell me which is the best one so that fps will be more than 50 at least. I'd like to use SSD, python, OpenCV. Note: I am using NVIDIA GeForce GTX 1050 Ti (it achieves only around 10 fps with SDD, cuDNN =7.6.5 , CUDA =10 and compute capability 6.1) while CPU is around 30 fps.

redhwan ( 2020-06-30 21:04:00 -0600 )edit

add a comment

How to run OpenCV DNN on NVidia GPU

2 answers

Comments

Comments

Links

Question Tools

Stats

Related questions

How to run OpenCV DNN on NVidia GPU edit

2 answers

Comments

Comments

Links

Question Tools

Stats

Related questions

How to run OpenCV DNN on NVidia GPU