OpenCV webcam stream slows down alongside caffe prediction

I'm attempting use caffe and python to do real-time image classification. I'm using OpenCV to stream from my webcam in one process, and in a separate process, using caffe to perform image classification on the frames pulled from the webcam. Then I'm passing the result of the classification back to the main thread to caption the webcam stream.

The problem is that even though I have an NVIDIA GPU and am performing the caffe predictions on the GPU, the main thread gets slown down. Normally without doing any predictions, my webcam stream runs at 30 fps; however, with the predictions, my webcam stream gets at best 15 fps.

Even when I run the two components are separate python programs (i.e. pull frames from the webcam in one script and run a separate script doing caffe predictions in an infinite loop) I still get a slowdown in OpenCV's ability to grab webcam frames. I've run the code in C++ with multithreading and experienced the exact same result.

I've verified that caffe is indeed using the GPU when performing the predictions, and that my GPU or GPU memory is not maxing out. I've also verified that my CPU cores are not getting maxed out at any point during the program. I'm wondering if I am doing something wrong or if there is no way to keep these 2 processes truly separate. Any advice is appreciated. Here is my code for reference

class Consumer(multiprocessing.Process):

 def __init__(self, task_queue, result_queue):
        self.task_queue = task_queue
        self.result_queue = result_queue
        #other initialization stuff

    def run(self):
        #Load caffe net -- code omitted 
        while True:
            image = self.task_queue.get()
            #crop image -- code omitted
            text = net.predict(image)


import cv2
import caffe
import multiprocessing
import Queue 

tasks = multiprocessing.Queue()
results = multiprocessing.Queue()
consumer = Consumer(tasks,results)

#Creating window and starting video capturer from camera
vc = cv2.VideoCapture(0)
#Try to get the first frame
if vc.isOpened():
    rval, frame =
    rval = False
frame_copy[:] = frame
task_empty = True
while rval:
    if task_empty:
       task_empty = False
    if not results.empty():
       text = results.get()
       #Add text to frame
       task_empty = True

    #Showing the frame with all the applied modifications
    cv2.imshow("preview", frame)

    #Getting next frame from camera
    rval, frame =
    frame_copy[:] = frame
    #Getting keyboard input 
    key = cv2.waitKey(1)
    #exit on ESC
    if key == 27:

I've tried testing the code skeleton by passing dummy text from the consumer process to the main one, and get no slowdown at all. I'm not sure why running the prediction itself makes OpenCV slow to get webcam frames. Here's that code below:

class Consumer(multiprocessing.Process):

    def __init__(self, task_queue, result_queue):
        self.task_queue = task_queue
        self.result_queue = result_queue
        #other initialization stuff

    def run(self):
        #Load caffe net -- code ...
If you read a group of frames and just do the predictions, how long does that take? I'll bet the GPU predictions are slower than the frame rate of the camera, so it's backing up.

Tetragramm ( 2016-10-17 18:10:18 -0600 )

Not sure that would make sense in this context since I'm trying to do object classification in real time. The GPU predictions are slower than the frame rate, but that's why I'm doing it in a separate python process that shouldn't really be using any CPU at all. I'm ok with lag between the caption on the stream and the what the stream is showing, since the predictions are slower. But I don't want the webcam stream itself to lag, which is what seems to be happening. It happens even when I run the two programs separately.

bfc_opencv ( 2016-10-17 18:32:57 -0600 )

Oh, sorry. I mis-read the indentations.

Hmm. Is it a steady 15 fps, or does it stutter on the transfers? In other words, if you time the loop with read(), is it a consistent time, or does one iteration in however many take extra long?

Tetragramm ( 2016-10-17 19:43:47 -0600 )

It's not consistent. The fps varies between 13-18 ish. I'd say the average is around 15.

bfc_opencv ( 2016-10-17 22:05:59 -0600 )

Can you time and see if most of the effort comes from the get or the put? That will help you figure out which is the bottleneck.

Tetragramm ( 2016-10-17 23:07:09 -0600 )

Well I passed dummy text from the consumer process back to the main process (in lieu of doing the actual neural net prediction) and experienced no slowdown. I've added that test code to the answer.

bfc_opencv ( 2016-10-18 00:56:59 -0600 )

I timed both the get and put calls and both of them are pretty much very close to 0 in their timings. I don't think that's the bottle neck. I know a lot of it is coming from waitKey() but I know a lot of the rendering for imshow() happens there. Any idea what might be going on here? Been stuck on this one for months

bfc_opencv ( 2016-10-19 00:44:48 -0600 )

Do you have several CPU cores, or could blocking in one thread hold up the other?

Tetragramm ( 2016-10-19 07:32:24 -0600 )

I have 2 cores and 4 with hyperthreading (i7 6500U CPU). i don't think they're holding up each other since I'm using multiprocessing instead of multi threading. I also don't believe I'm running any blocking methods.

bfc_opencv ( 2016-10-19 12:19:34 -0600 )

I'm not sure if it's the structure of the code itself because when I ran a program that just streamed from the webcam without doing anything else and in a separate window ran a caffe prediction in a loop over and over I got a slowdown in the webcam stream.

bfc_opencv ( 2016-10-19 12:21:19 -0600 )

The .get() function in the queue says it's blocking.

I expect the part of the caffe code that actually sends the data to the GPU takes a lot of resources and backs things up. Not sure how to check it though.

Tetragramm ( 2016-10-19 17:44:51 -0600 )

I want the consumer process to block on get() because it should wait until there's something for it to pass to the caffe net. The main process doesn't block because I use empty() to check before calling get(). But I was wondering why I got the same result when I ran them separately. Why would caffes data transfer interfere with video capture?

bfc_opencv ( 2016-10-19 20:47:02 -0600 )