Ask Your Question

Revision history [back]

GPU detectMultiScale on LBP classifier gets into an infinite loop, how to check?

I have some detector code running on CPU which works perfectly. Now I would like to see if it is possible to convert the code towards GPU processing, to increase the speed. However, it seems my code gets kind of stuck inside the GPU processing, so I am requesting some assistance.

Code:

gpu::CascadeClassifier_GPU cascade;

string detection_model = argv[3];
if( !cascade.load( detection_model ) ){ 
    printf("--(!)Error loading the trained model! \n"); 
    return -1; 
};

gpu::GpuMat image_gpu(image.clone());
gpu::GpuMat faces;

gpu::equalizeHist(image_gpu, image_gpu);

int detections = cascade.detectMultiScale( image_gpu, faces, Size(217, 85), Size(160, 63), 1.05, 2);

Some remarks on the code

  • The image_gpu object exists. It is a CV_8UC1 matrix of 1000 * 1000 pixels. I have done an imshow and there is actually data contained in the object.
  • The equilizeHist function is working perfectly. Again an imshow before and after the function, shows a perfectly processed input image.
  • The detectMultiScale is configure like the second C++ configuration of parameters, given in the documentations. It says it can only be applied to LBP classifiers, which I have and which are the exact same parameters as my CPU code.

The code during debugging enters the detectMultiScale function, it starts computing, my nvidia gpu show activity, but it doesn't end. Code is performed at 5 fps on the CPU normally, have been waiting for 25 minutes now, function doesn't end with GPU.

Any ideas on what could be going wrong?

System configuration:

  • Windows 7 x64 bit
  • Geforce GT 540M with CUDA 5.0 support
  • Visual Studio 2010
  • OpenCV2.4.5 manually build with CUDA 5.0 and TBB 4.1 support and built for architecture 2.1 (which is supported by my GPU)

If any more information is needed, please feel free to ask!

GPU detectMultiScale on LBP classifier gets into an infinite loop, how to check?

I have some detector code running on CPU which works perfectly. Now I would like to see if it is possible to convert the code towards GPU processing, to increase the speed. However, it seems my code gets kind of stuck inside the GPU processing, so I am requesting some assistance.

Code:

gpu::CascadeClassifier_GPU cascade;

string detection_model = argv[3];
if( !cascade.load( detection_model ) ){ 
    printf("--(!)Error loading the trained model! \n"); 
    return -1; 
};

gpu::GpuMat image_gpu(image.clone());
gpu::GpuMat faces;

gpu::equalizeHist(image_gpu, image_gpu);

int detections = cascade.detectMultiScale( image_gpu, faces, Size(217, 85), Size(160, 63), 1.05, 2);

Some remarks on the code

  • The image_gpu object exists. It is a CV_8UC1 matrix of 1000 * 1000 pixels. I have done an imshow and there is actually data contained in the object.
  • The equilizeHist function is working perfectly. Again an imshow before and after the function, shows a perfectly processed input image.
  • The detectMultiScale is configure like the second C++ configuration of parameters, given in the documentations. It says it can only be applied to LBP classifiers, which I have and which are the exact same parameters as my CPU code.

The code during debugging enters the detectMultiScale function, it starts computing, my nvidia gpu show activity, but it doesn't end. Code is performed at 5 fps on the CPU normally, have been waiting for 25 minutes now, function doesn't end with GPU.

Any ideas on what could be going wrong?

System configuration:

  • Windows 7 x64 bit
  • Geforce GT 540M with CUDA 5.0 support
  • Visual Studio 2010
  • OpenCV2.4.5 manually build with CUDA 5.0 and TBB 4.1 support and built for architecture 2.1 (which is supported by my GPU)

If any more information is needed, please feel free to ask!


EDIT: I have tried to seperate all ingoing and outcomming data from all functions, but still, the detectMultiScale keeps going for ages and no solution is reached. Kind of wondering if this is even normal. Is the detectMultiScale with maxObjectSize and minSize verified as stable?

New code (pretty the same)

// Read in the input image
Mat input = imread("c:/data/test.png", 1);
Mat gray;
cvtColor( input, gray, CV_BGR2GRAY );

// Initialize the GPU cascade object
gpu::CascadeClassifier_GPU cascade;
if( !cascade.load( "c:/data/cascade.xml" ) ){ printf("--(!)Error loading the trained model! \n"); return -1; };

gpu::GpuMat image_gpu(gray);
gpu::GpuMat gray_converted, faces;

gpu::equalizeHist(image_gpu, gray_converted);

int detections = cascade.detectMultiScale( gray_converted, faces, Size(217, 85), Size(160, 63), 1.05, 2);

GPU detectMultiScale on LBP classifier gets into an infinite loop, how to check?

I have some detector code running on CPU which works perfectly. Now I would like to see if it is possible to convert the code towards GPU processing, to increase the speed. However, it seems my code gets kind of stuck inside the GPU processing, so I am requesting some assistance.

Code:

gpu::CascadeClassifier_GPU cascade;

string detection_model = argv[3];
if( !cascade.load( detection_model ) ){ 
    printf("--(!)Error loading the trained model! \n"); 
    return -1; 
};

gpu::GpuMat image_gpu(image.clone());
gpu::GpuMat faces;

gpu::equalizeHist(image_gpu, image_gpu);

int detections = cascade.detectMultiScale( image_gpu, faces, Size(217, 85), Size(160, 63), 1.05, 2);

Some remarks on the code

  • The image_gpu object exists. It is a CV_8UC1 matrix of 1000 * 1000 pixels. I have done an imshow and there is actually data contained in the object.
  • The equilizeHist function is working perfectly. Again an imshow before and after the function, shows a perfectly processed input image.
  • The detectMultiScale is configure like the second C++ configuration of parameters, given in the documentations. It says it can only be applied to LBP classifiers, which I have and which are the exact same parameters as my CPU code.

The code during debugging enters the detectMultiScale function, it starts computing, my nvidia gpu show activity, but it doesn't end. Code is performed at 5 fps on the CPU normally, have been waiting for 25 minutes now, function doesn't end with GPU.

Any ideas on what could be going wrong?

System configuration:

  • Windows 7 x64 bit
  • Geforce GT 540M with CUDA 5.0 support
  • Visual Studio 2010
  • OpenCV2.4.5 manually build with CUDA 5.0 and TBB 4.1 support and built for architecture 2.1 (which is supported by my GPU)

If any more information is needed, please feel free to ask!


EDIT: EDIT 1:

I have tried to seperate all ingoing and outcomming data from all functions, but still, the detectMultiScale keeps going for ages and no solution is reached. Kind of wondering if this is even normal. Is the detectMultiScale with maxObjectSize and minSize verified as stable?

New code (pretty the same)

// Read in the input image
Mat input = imread("c:/data/test.png", 1);
Mat gray;
cvtColor( input, gray, CV_BGR2GRAY );

// Initialize the GPU cascade object
gpu::CascadeClassifier_GPU cascade;
if( !cascade.load( "c:/data/cascade.xml" ) ){ printf("--(!)Error loading the trained model! \n"); return -1; };

gpu::GpuMat image_gpu(gray);
gpu::GpuMat gray_converted, faces;

gpu::equalizeHist(image_gpu, gray_converted);

int detections = cascade.detectMultiScale( gray_converted, faces, Size(217, 85), Size(160, 63), 1.05, 2);

EDIT 2 :

I tried lowering the resolution of the input image, towards a 150x150 pixel image, and again there the code doesn't return zero detections, which is weird since the input image is actually to small for the image... Shouldn't it return immediatly?

GPU detectMultiScale on LBP classifier gets into an infinite loop, how to check?

I have some detector code running on CPU which works perfectly. Now I would like to see if it is possible to convert the code towards GPU processing, to increase the speed. However, it seems my code gets kind of stuck inside the GPU processing, so I am requesting some assistance.

Code:

gpu::CascadeClassifier_GPU cascade;

string detection_model = argv[3];
if( !cascade.load( detection_model ) ){ 
    printf("--(!)Error loading the trained model! \n"); 
    return -1; 
};

gpu::GpuMat image_gpu(image.clone());
gpu::GpuMat faces;

gpu::equalizeHist(image_gpu, image_gpu);

int detections = cascade.detectMultiScale( image_gpu, faces, Size(217, 85), Size(160, 63), 1.05, 2);

Some remarks on the code

  • The image_gpu object exists. It is a CV_8UC1 matrix of 1000 * 1000 pixels. I have done an imshow and there is actually data contained in the object.
  • The equilizeHist function is working perfectly. Again an imshow before and after the function, shows a perfectly processed input image.
  • The detectMultiScale is configure like the second C++ configuration of parameters, given in the documentations. It says it can only be applied to LBP classifiers, which I have and which are the exact same parameters as my CPU code.

The code during debugging enters the detectMultiScale function, it starts computing, my nvidia gpu show activity, but it doesn't end. Code is performed at 5 fps on the CPU normally, have been waiting for 25 minutes now, function doesn't end with GPU.

Any ideas on what could be going wrong?

System configuration:

  • Windows 7 x64 bit
  • Geforce GT 540M with CUDA 5.0 support
  • Visual Studio 2010
  • OpenCV2.4.5 manually build with CUDA 5.0 and TBB 4.1 support and built for architecture 2.1 (which is supported by my GPU)

If any more information is needed, please feel free to ask!


EDIT 1:

I have tried to seperate all ingoing and outcomming data from all functions, but still, the detectMultiScale keeps going for ages and no solution is reached. Kind of wondering if this is even normal. Is the detectMultiScale with maxObjectSize and minSize verified as stable?

New code (pretty the same)

// Read in the input image
Mat input = imread("c:/data/test.png", 1);
Mat gray;
cvtColor( input, gray, CV_BGR2GRAY );

// Initialize the GPU cascade object
gpu::CascadeClassifier_GPU cascade;
if( !cascade.load( "c:/data/cascade.xml" ) ){ printf("--(!)Error loading the trained model! \n"); return -1; };

gpu::GpuMat image_gpu(gray);
gpu::GpuMat gray_converted, faces;

gpu::equalizeHist(image_gpu, gray_converted);

int detections = cascade.detectMultiScale( gray_converted, faces, Size(217, 85), Size(160, 63), 1.05, 2);

EDIT 2 :

I tried lowering the resolution of the input image, towards a 150x150 pixel image, and again there the code doesn't return zero detections, which is weird since the input image is actually to small for the image... Shouldn't it return immediatly?


EDIT 3 :

I have continued to debug and found out the following.

This line of code works actually and does what I want. It defines a maximum size for my object, and takes no size for the smallest object size whatsover. Normally it finds tons of detections, even at scales I do not want it to search, and making it slower than the CPU implementation I had.

int detections = cascade.detectMultiScale(image_gpu, faces, Size(217, 85), Size(), 1.01, 2);

However adding a minimum size, resulted in a hanging program, implicating more and more that the function has a problem with the arguments or I am understanding them wrongly.

int detections = cascade.detectMultiScale(image_gpu, faces, Size(217, 85), Size(160, 63), 1.01, 2);

GPU detectMultiScale on LBP classifier gets into an infinite loop, how to check?

I have some detector code running on CPU which works perfectly. Now I would like to see if it is possible to convert the code towards GPU processing, to increase the speed. However, it seems my code gets kind of stuck inside the GPU processing, so I am requesting some assistance.

Code:

gpu::CascadeClassifier_GPU cascade;

string detection_model = argv[3];
if( !cascade.load( detection_model ) ){ 
    printf("--(!)Error loading the trained model! \n"); 
    return -1; 
};

gpu::GpuMat image_gpu(image.clone());
gpu::GpuMat faces;

gpu::equalizeHist(image_gpu, image_gpu);

int detections = cascade.detectMultiScale( image_gpu, faces, Size(217, 85), Size(160, 63), 1.05, 2);

Some remarks on the code

  • The image_gpu object exists. It is a CV_8UC1 matrix of 1000 * 1000 pixels. I have done an imshow and there is actually data contained in the object.
  • The equilizeHist function is working perfectly. Again an imshow before and after the function, shows a perfectly processed input image.
  • The detectMultiScale is configure like the second C++ configuration of parameters, given in the documentations. It says it can only be applied to LBP classifiers, which I have and which are the exact same parameters as my CPU code.

The code during debugging enters the detectMultiScale function, it starts computing, my nvidia gpu show activity, but it doesn't end. Code is performed at 5 fps on the CPU normally, have been waiting for 25 minutes now, function doesn't end with GPU.

Any ideas on what could be going wrong?

System configuration:

  • Windows 7 x64 bit
  • Geforce GT 540M with CUDA 5.0 support
  • Visual Studio 2010
  • OpenCV2.4.5 manually build with CUDA 5.0 and TBB 4.1 support and built for architecture 2.1 (which is supported by my GPU)

If any more information is needed, please feel free to ask!


EDIT 1:

I have tried to seperate all ingoing and outcomming data from all functions, but still, the detectMultiScale keeps going for ages and no solution is reached. Kind of wondering if this is even normal. Is the detectMultiScale with maxObjectSize and minSize verified as stable?

New code (pretty the same)

// Read in the input image
Mat input = imread("c:/data/test.png", 1);
Mat gray;
cvtColor( input, gray, CV_BGR2GRAY );

// Initialize the GPU cascade object
gpu::CascadeClassifier_GPU cascade;
if( !cascade.load( "c:/data/cascade.xml" ) ){ printf("--(!)Error loading the trained model! \n"); return -1; };

gpu::GpuMat image_gpu(gray);
gpu::GpuMat gray_converted, faces;

gpu::equalizeHist(image_gpu, gray_converted);

int detections = cascade.detectMultiScale( gray_converted, faces, Size(217, 85), Size(160, 63), 1.05, 2);

EDIT 2 :

I tried lowering the resolution of the input image, towards a 150x150 pixel image, and again there the code doesn't return zero detections, which is weird since the input image is actually to small for the image... Shouldn't it return immediatly?


EDIT 3 :

I have continued to debug and found out the following.

This line of code works actually and does what I want. It defines a maximum size for my object, and takes no size for the smallest object size whatsover. Normally it finds tons of detections, even at scales I do not want it to search, and making it slower than the CPU implementation I had.

int detections = cascade.detectMultiScale(image_gpu, faces, Size(217, 85), Size(), 1.01, 2);

However adding a minimum size, resulted in a hanging program, implicating more and more that the function has a problem with the arguments or I am understanding them wrongly.

int detections = cascade.detectMultiScale(image_gpu, faces, Size(217, 85), Size(160, 63), 1.01, 2);

EDIT 4 :

If you want source code and data to reproduce the problem, click the link to the issue tracker. All files have been added over there.