Ask Your Question

yid's profile - activity

2013-05-16 02:45:32 -0600 received badge  Necromancer (source)
2013-05-15 23:37:16 -0600 received badge  Editor (source)
2013-05-15 04:19:49 -0600 answered a question Speeding up Haartraining with AMD GPU and OpenCL

Hi,

I tried to get the face detection run as well and tried to make the facedetect demo run. There was a compiler error in an opencl kernel on my mac using an AMD GPU. I posted my solution for that as the answer to another question here in the forum .

I had to de-comment the line #108 in my code to move the calculations to the GPU however:

//setDevice(oclinfo[0]);

just remove the two // and you the haar detection should run on the GPU.

2013-05-15 04:01:36 -0600 answered a question Strange error when trying to use OCL library

Hi,

The error means that your opencl compiler could not compile some code and quit compiling.

I also had a problem trying to get run the face detection on Mac OS X. But I solved it! Here's how:

  1. finding out the erroneous opencl kernel file:

modules/ocl/src/initialization.cpp tries to compile opencl kernels. unfortunately the ATI/AMD compiler on the mac does not compile opencl kernel. This is what causes the error. To find out which kernel it it I inserted at the beginning of the method

      cl_kernel openCLGetKernelFromSource(const Context *clCxt, const char **source, string kernelName,
                                        const char *build_options)

the following line:

      fprintf(stderr, "Loading %s %s\n", clCxt->impl->binpath.c_str(), kernelName.c_str());

now I can see all kernels that are compiled. The last one before the error causes the crash. I my case it was "integral_cols". A short "grep" command later I knew it was in "modules/ocl/src/opencl/imgproc_integral.cl"

  1. finding and fixing the erroneous line:

This was a bit of trial and error. I took a program which only compiles opencl kernels. (In my case it was my Qt based test program for openCL), but you may use opencv itself. In that case you always have to patch - make opencl - make install for each iteration which is not nice after all...

By commenting out all code with "#if 0" ... "#endif" blocks and slowly de-commenting I deduced which lines did not work. In my case it was some combination of the conditional-operator ("?: operator") and double braces "[][]" in one statement which could not be compiled. I refactored this code. You can see the result here: (the #if 0 block is what not compiled, and the #else branch is my version)

#if 0
        sum_t[0] = (i == 0 ? 0 : lm_sum[0][LSIZE_2 + LOG_LSIZE]);
        sqsum_t[0] = (i == 0 ? 0 : lm_sqsum[0][LSIZE_2 + LOG_LSIZE]);
        sum_t[1] =  (i == 0 ? 0 : lm_sum[1][LSIZE_2 + LOG_LSIZE]);
    sqsum_t[1] =  (i == 0 ? 0 : lm_sqsum[1][LSIZE_2 + LOG_LSIZE]);
 #else
        if (i==0) {
            sum_t[0] = 0;
            sqsum_t[0] = 0;
            sum_t[1] =  0;
            sqsum_t[1] =  0;
        } else {
            sum_t[0] = lm_sum[0][LSIZE_2 + LOG_LSIZE];
            sqsum_t[0] = lm_sqsum[0][LSIZE_2 + LOG_LSIZE];
            sum_t[1] =  lm_sum[1][LSIZE_2 + LOG_LSIZE];
            sqsum_t[1] =  lm_sqsum[1][LSIZE_2 + LOG_LSIZE];
        }
 #endif

the integral_rows kernel had a similar problem and was to be fixed also:

#if 0
        src_t[0] = i + lid < rows ? srcsum[(lid+i) * src_step + gid * 2] : 0;
        sqsrc_t[0] = i + lid < rows ? srcsqsum[(lid+i) * src_step + gid * 2] : 0;
        src_t[1] = i + lid < rows ? srcsum[(lid+i) * src_step + gid * 2 + 1] : 0;
        sqsrc_t[1] = i + lid < rows ? srcsqsum[(lid+i) * src_step + gid * 2 + 1] : 0;

        sum_t[0] =  (i == 0 ? 0 : lm_sum[0][LSIZE_2 + LOG_LSIZE]);
        sqsum_t[0] =  (i == 0 ? 0 : lm_sqsum[0][LSIZE_2 + LOG_LSIZE]);
        sum_t[1] =  (i == 0 ? 0 : lm_sum[1][LSIZE_2 + LOG_LSIZE]);
        sqsum_t[1] =  (i == 0 ? 0 : lm_sqsum[1][LSIZE_2 + LOG_LSIZE]);
#else
        if (i + lid < rows) {
            src_t[0] =  srcsum[(lid+i) * src_step + gid * 2];
            sqsrc_t[0] = srcsqsum[(lid+i) * src_step + gid * 2];
            src_t[1] = srcsum[(lid+i) * src_step + gid * 2 + 1] ;
            sqsrc_t[1 ...
(more)