Parallelizing GPU processing of multiple images
For each frame of a video, I apply some transformations and then write the frame out to an image file. I am using OpenCV's CUDA API for this, so it looks something like this, in a loop:
# read frame from video
_, frame = video.read()
# upload frame to GPU
frame = cv2.cuda_GpuMat(frame)
# create a CUDA stream
stream = cv2.cuda_Stream()
# do things to the frame
# ...
# download the frame to CPU memory
frame = frame.download(steam=stream)
# wait for the stream to complete (CPU memory available)
stream.waitForCompletion()
# save frame out to disk
# ...
Since I send a single frame to the GPU, and then wait for its completion at the end of the loop, I can only process one frame at a time.
What I would like to do is send multiple frames (in multiple streams) to the GPU to be processed at the same time, then save them to disk as the work gets finished.
What is the best way to do this?
The best way to do this is using OpenGL 4.3's compute shaders, along with C++.
Can you link to an example please? Where would OpenGL be used, would you still be using CUDA to access the GPU?
The control of the GPU by OpenGL does not use CUDA. CUDA was designed before OpenGL computer shaders were part of the standard.
Keep in mind that you can only bind so many textures at once. This inherent limitation is platform-agnostic — it happens on CUDA and OpenGL. For instance, the Intel would only bind 8 textures at a time, where is was 64 on an AMD Vega.
For simple compute shader code, see: https://github.com/sjhalayka/qjs_comp...
... that said, I believe that you can use CUDA and OpenGL in the same app.
Thanks for the link. I am still unsure what the advantage of opengl would be over cuda in general unless interacting with a graphics pipeline. My understanding is that it offers a less mature interface to gpu computation than cuda with the advantage that it will run on amd and integrated gpu's. I guess it is also closer to the metal than opencl so maybe if the implementation is good it will be faster on those aswell. Since this user already has an nvidia gpu, the routines they need have existing opencv cuda implementations and the functionality they requested is built in i think writing everything from scratch in opengl would be the wrong way to go. Futhermore without experience of writing opengl i think their implementations would be slower than the existing cuda ones.