Ask Your Question

Twinsy's profile - activity

2016-04-27 22:59:21 -0600 received badge  Nice Answer (source)
2016-04-20 13:45:42 -0600 received badge  Teacher (source)
2016-04-20 06:33:03 -0600 commented question OpenCV 3.1 CUDA 7.5 detectMultiScale function works slower on GPU then CPU

Thank you for your answer!

Actually i was thinking the same, while trying to optimize the process. I watched also while running the program that all of my 4 cores are working on 70-80%. But the funny thing was that i wrote a seqential program, so i though maybe its optimized automaticly with TBB or something like that. So i had a feeling about that and now according to you, it's true. :) (The funny thing, as i remember i didn't enable TBB on Cmake, maybe only the "with TBB" section, but i'm pretty sure that i set disable the "Build TBB" part.)

Do you know a different process maybe in openCV, what is optimized more on GPU for face detection?

2016-04-20 03:20:36 -0600 commented question OpenCV 3.1 CUDA 7.5 detectMultiScale function works slower on GPU then CPU

I'm working with full HD video/camera input. So my picture is 1920*1080, but i tested it with smaller videos also (same result). This is my settings:

  • haarcascade_frontalface_alt.xml;
  • 1920*1080;
  • cascade_ScaleFactor = 1.3;
  • cascade_MinNumberNeighbor = 3;
  • cascade_gpu->setFindLargestObject(false);

The Min and Max scale is dynamicly adjusting in 10 frame for the detected face size.

cascadeMinSize = cv::Size((int)(avgMinW*0.7), (int)(avgMinH*0.7));
    cascadeMaxSize = cv::Size((int)(avgMaxW*1.3), (int)(avgMaxH*1.3));

Like that. And generally my faces on the full HD video are between 300x300 and 500x500 px.

2016-04-19 11:03:12 -0600 answered a question How to count faces in the video?

Hi!

from the part of: face_cascade.detectMultiScale(frame_gray, faces, .....) <<-- The faces vector contains all the detected faces. So faces.size() gives you the number of detected faces.

int detected_faces=faces.size();

That simple.

2016-04-19 08:32:58 -0600 commented question OpenCV 3.1 CUDA 7.5 detectMultiScale function works slower on GPU then CPU

I'm not even sure that i'm running my build correctly. I'm just building an exe, then running it.

2016-04-19 08:30:46 -0600 asked a question OpenCV 3.1 CUDA 7.5 detectMultiScale function works slower on GPU then CPU

Good Day!

My question would be, that i'm currently trying to optimize my C++ program for GPU. My PC (relevant part):

  • Geforce GTX 780
  • I5-6600K
  • Corsair vengeance 2.6GHZ memory 16GB

My code is pretty big, because it's connected with an AI, and i also use Landmark detection aswell, so now i will post only the relevant part of the code. Basictly my problem is, then every settings how i try gives slower results on GPU then CPU.

My code:

double cascade_ScaleFactor=1.2;
 cascade_MinNumberNeighbor=3;


    void facedetector(cv::Mat& frame, BufferFaceGPU& b)
    {   

        double processT,processT_total;

        /****************************/
        /***********GPU**************/
        /****************************/

        if (GPUx==1){   
            /***********VERSION 1.0 OLD*************/
            cascade_gpu->setMinObjectSize(cascadeMinSize);
            cascade_gpu->setMaxObjectSize(cascadeMaxSize);

            processT_total = (double)cv::getTickCount();

            std::vector<Rect> faces;
            cv::Mat cpu_frame_gray;

            processT = (double)cv::getTickCount();
            b.gpu_frame.upload(frame);
            processT = (double)cv::getTickCount() - processT;
            processT /= (double)cv::getTickFrequency();
            read_write_data_tofile("GPU_data_upload.txt", processT);

            processT = (double)cv::getTickCount();
            cv::cuda::cvtColor(b.gpu_frame, b.gpu_frame, CV_BGR2GRAY);
            processT = (double)cv::getTickCount() - processT;
            processT /= (double)cv::getTickFrequency();
            read_write_data_tofile("GPU_data_cvtColor.txt", processT);

            processT = (double)cv::getTickCount();
            cv::cuda::equalizeHist(b.gpu_frame, b.gpu_frame);
            processT = (double)cv::getTickCount() - processT;
            processT /= (double)cv::getTickFrequency();
            read_write_data_tofile("GPU_data_equalizeHist.txt", processT);

            processT = (double)cv::getTickCount();
            cascade_gpu->detectMultiScale(b.gpu_frame, b.gpu_faces);
            processT = (double)cv::getTickCount() - processT;
            processT /= (double)cv::getTickFrequency();
            read_write_data_tofile("GPU_data_detectMultiScale.txt", processT);


            processT = (double)cv::getTickCount();
            cascade_gpu->convert(b.gpu_faces, faces);
            processT = (double)cv::getTickCount() - processT;
            processT /= (double)cv::getTickFrequency();
            read_write_data_tofile("GPU_data_convert.txt", processT);


            processT = (double)cv::getTickCount();
            b.gpu_frame.download(cpu_frame_gray);
            processT = (double)cv::getTickCount() - processT;
            processT /= (double)cv::getTickFrequency();
            read_write_data_tofile("GPU_data_download.txt", processT);

            if (!faces.empty())
            {
                processT = (double)cv::getTickCount();
                get_landmarks(faces, cpu_frame_gray, frame);
                processT = (double)cv::getTickCount() - processT;
                processT /= (double)cv::getTickFrequency();
                read_write_data_tofile("GPU_data_getLandmarks.txt", processT);
            }


            processT_total = (double)cv::getTickCount() - processT_total;
            processT_total /= (double)cv::getTickFrequency();
            read_write_data_tofile("GPU_data_total.txt", processT_total);

        }

        /****************************/
        /***********CPU**************/
        /****************************/
        else if(GPUx==2){
            cv::Mat frame_gray;
            std::vector<Rect> faces;
            processT_total = (double)cv::getTickCount();

            processT = (double)cv::getTickCount();
            cv::cvtColor(frame, frame_gray, CV_BGR2GRAY);
            processT = (double)cv::getTickCount() - processT;
            processT /= (double)cv::getTickFrequency();
            read_write_data_tofile("CPU_data_cvtColor.txt", processT);

            processT = (double)cv::getTickCount();
            cv::equalizeHist(frame_gray, frame_gray);
            processT = (double)cv::getTickCount() - processT;
            processT /= (double)cv::getTickFrequency();
            read_write_data_tofile("CPU_data_equalizeHist.txt", processT);

            processT = (double)cv::getTickCount();
            face_cascade.detectMultiScale(frame_gray, faces, cascade_ScaleFactor, cascade_MinNumberNeighbor, 0 | CV_HAAR_SCALE_IMAGE, cascadeMinSize,cascadeMaxSize);
            processT = (double)cv::getTickCount() - processT;
            processT /= (double)cv::getTickFrequency();
            read_write_data_tofile("CPU_data_detecetMultiScale.txt", processT);

            if (!faces.empty())
            {
                processT = (double)cv::getTickCount();
                get_landmarks(faces, frame_gray, frame);
                processT = (double)cv::getTickCount() - processT;
                processT /= (double)cv::getTickFrequency();
                read_write_data_tofile("CPU_data_getLandmarks.txt", processT);
            }


            processT_total = (double)cv::getTickCount() - processT_total;
            processT_total /= (double)cv::getTickFrequency();
            read_write_data_tofile("CPU_data_total.txt", processT_total);
        }
        else{
            errormsg("Something went wrong!\nEXIT");
        }

    }

Sorry for the long code. I tried a bunch of optimalization (e.g. the max and min size is in a PID controller, and it's alwasy have to search for just a reasenable size of faces).

I'm monitoring the FPS and the process times also, and get almost a 1/4 of the CPU processed FPS. My results in monitoring is just like that: image description

image description

image description

On the last picture you can clearly see that ... (more)

2016-04-07 08:51:01 -0600 commented question Having trouble Building Opencv 3.1

On 32bit, everything is perfect with the openCV binarys and lib.-s but on 32bit i cannot compile opencv_cudaarithm300.dll, copencv_cudawarping300.dll and opencv_cudalegacy300.dll;

Now im in a situation, where "I need to have a normal build with OpenCV 3.1 and with CUDA 7.5, and i dont care its 32 or 64 bit". Im desprete now

2016-04-07 08:44:49 -0600 asked a question Having trouble Building Opencv 3.1

Hi!

It's not my first openCV build just to know :) Now, i made my own libary with cMake with contribute lib.-s and CUDA 7.5. I build 64 bit and my compiler is VS 12 2013 x64 (a read that u canot use CUDA lib.-s if its not compiled 64 bit.).

Now i did as before, just like many tutorials and so ever, and i cannot build my opencv_videoio300.dll. I have the lib, but nothing else.

Error 97 error LNK2019: unresolved external symbol "private: long __cdecl videoInput::getDevice(struct IBaseFilter * *,int,wchar_t *,char *)" (?getDevice@videoInput@@AEAAJPEAPEAUIBaseFilter@@HPEA_WPEAD@Z) referenced in function "public: virtual double __cdecl cv::VideoCapture_DShow::getProperty(int)const " (?getProperty@VideoCapture_DShow@cv@@UEBANH@Z) C:\opencv3\build\modules\videoio\cap_dshow.obj 1

Error 98 error LNK1120: 1 unresolved externals C:\opencv3\build\bin\Release\opencv_videoio300.dll

This is the only error in the whole INSTALL build. Cannot figure out whats the problem. Any suggestion?