Unable to create PAGE_LOCKED or SHARED host memory using the Python binding to HostMem
- NVIDIA Jetson Xavier NX
- OpenCV 4.4 built with CUDA support
- Python 3.6.9
I'm having trouble with the Python binding to the HostMem class and can't create PAGE_LOCKED or SHARED host memory. I'm not sure I'm using it correctly and haven't been able to find any examples. I've tried two different ways of creating page-locked host memory so that I can call cv2.cuda image processing methods. Here's what I've tried:
a_mem = cv2.cuda_HostMem(cv2.cuda.HostMem_PAGE_LOCKED)
a_mem.create(num_rows, num_cols, cv2.CV_8UC1)
a_host = a_mem.createMatHeader()
a_dev = cv2.cuda_GpuMat(a_host)
or
a_mem = cv2.cuda_HostMem(num_rows, num_cols, cv2.CV_8UC1, cv2.cuda.HostMem_PAGE_LOCKED)
a_host = a_mem.createMatHeader()
a_dev = cv2.cuda_GpuMat(a_host)
In both cases I get Mat and GpuMat references that I can successfully use to make CUDA calls:
a_dev.upload(a_host)
cv2.cuda.add(a_dev, b_dev, c_dev)
c_dev.download, c_host)
But when I use NVIDIA Visual Profiler to examine the uploads and downloads it tells me that my host memory is Pageable and not Pinned as I would expect for page-locked host memory. I have been able to use cv2.cuda.registerPageLocked() to create (much faster) Pinned host memory so I believe what Visual Profiler is telling me. I've tried this same test with cv2.cuda.HostMem_SHARED and I get the same results.
Can someone please tell me if I'm creating the host memory and the Mat and GpuMat references incorrectly?
Also, when I do succeed in creating SHARED host memory, how do I get a GpuMat reference to it? I feel like I should be using HostMem's createGpuMatHeader() method for this but it doesn't have a Python binding.
Thanks for any help I can get. I've been stuck on this for three days.
From memory that functionality had not been implemented in python (it may have been now, but from your experience it appears that it has not). As you have found
cv2.cuda.registerPageLocked()
works perfectly, is there any reason you can't use this?Thanks for your suggestion. I've been using registerPageLocked while I try to create shared/mapped memory for zero-copy. The registerPageLocked function does give me Pinned host memory and the HtoD and DtoH copies run about 4 times faster than when I use Pageable memory.
Is there a way I can find out what has and hasn't been implemented in python yet, other than trial and error? I'm new to OpenCV and so when something doesn't work I naturally assume it's because I'm doing it wrong. and not that because it isn't finished.That's why I've spent three days struggling with this
Would it be safe to assume that if python tests for HostMem appear in the source code then it has been implemented, and if there is no test then the python bindings aren't complete yet?
Hi, the python bindings for CUDA are relatively new Aug 2018. If I remember correctly, before the bindings were updated each function not in the cuda:: namespace had Mat and UMat binding generated with the cuda functions completely left out. Since 2018 that was extended to generate bindings for the cuda:: namespace with a new GpuMat type. Unfortunately the HostMem type was not included, I looked at implementing it myself but because there I could use
registerPageLocked()
I abandoned the idea. So in conclusion anything which uses GpuMat should work, but it may not as the CUDA bindings have not been as thoroughly tested as the standard ones. So trial and error. If you get a weird Umat error, then you have passed by the GpuMat bindings and have probably used the wrong input args.And yes I would use the python tests as a guide.