Ask Your Question

Gpu memory leak when resizing asynchronously

asked 2019-08-14 05:44:53 -0500

pittie gravatar image

I'm facing some problem with gpu resize using opencv. Here is my code:

#define MX 500
#define ASYNC 0

class job {
    cv::cuda::GpuMat gpuImage;
    cv::cuda::Stream stream;
    cv::Mat cpuImage;

    ~job() {
        printf("job deleted\n");

void onComplete(int status, void* uData) {
    job* _job = (job*) uData;
    delete _job;

void resize(job* _job, vector<uchar> buffer) {
    _job->cpuImage = cv::imdecode(buffer, cv::IMREAD_COLOR);
    if (ASYNC) {
        _job->gpuImage.upload(_job->cpuImage, _job->stream);
        cv::cuda::resize(_job->gpuImage, _job->gpuImage, cv::Size(100, 100), 0, 0, cv::INTER_NEAREST, _job->stream);
        _job->>cpuImage, _job->stream);
        _job->stream.enqueueHostCallback(onComplete, _job);
        // _job->stream.waitForCompletion();
    } else {
        cv::cuda::resize(_job->gpuImage, _job->gpuImage, cv::Size(100, 100), 0, 0, cv::INTER_NEAREST);
        delete _job;


vector<uchar> readFile(string filename) {
    std::ifstream input(filename, std::ios::binary);
    std::vector<unsigned char> buffer(std::istreambuf_iterator<char>(input),{});
    return buffer;

int main() {
    for (int i = 0; i < MX; i++) {
        vector<uchar> buf = readFile("input.jpg");
        job* _job = new job();
        resize(_job, buf);
    while (true) {
        // wait
    return 0;

When I run resize synchronously (ASYNC = 0), the code works perfectly fine. But when I run it asynchronously (ASYNC = 1), it seems that some gpu memory is lost somewhere despite the fact that I have deleted all created GpuMats and Streams. The more loop I run, the less free memory I have. is there a bug or part of my code is wrong?

edit retag flag offensive close merge delete

1 answer

Sort by ยป oldest newest most voted

answered 2019-08-18 09:41:23 -0500

pittie gravatar image

problem solved. here is the note of the callback from OpenCV docs:

Callbacks must not make any CUDA API calls. Callbacks must not perform any synchronization that may depend on outstanding device work or other callbacks that are not mandated to run earlier. Callbacks without a mandated order (in independent streams) execute in undefined order and may be serialized.

I had read the note but didn't actually notice that even deleting a cv::cuda::* still causes problems. So the solution is to avoid "touching" any cv::cuda::* in the callback, even deleting or releasing.

edit flag offensive delete link more

Question Tools

1 follower


Asked: 2019-08-14 05:44:53 -0500

Seen: 348 times

Last updated: Aug 18 '19