Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

More than 2 years after I asked the question, (I didn't know I was looking for something like this on that time) I implemented this. The key is to use border pursuit, and then Fourier descriptors. Now Fourier descriptors are size invariant but not rotation invariant. To make them rotation invariant, I read the excellent book "Feautre Extraction and Image Processing" byNixon and Aguado

However I suspect that on that time (as now too) I was looking for ways to find those patterns when the pattern was inside a bunch of other patterns

Unfortunately - unless the patters are quite separable, the above method won't work.

More than 2 years after I asked the question, (I didn't know I was looking for something like this on that time) I implemented this. The key is to use border pursuit, and then Fourier descriptors. Now Fourier descriptors are size invariant but not rotation invariant. To make them rotation invariant, I read the excellent book "Feautre Extraction and Image Processing" byNixon and Aguado

However I suspect that on that time (as now too) I was looking for ways to find those patterns when the pattern was inside a bunch of other patterns

Unfortunately - unless the patters are quite separable, the above method won't work.


I am going to describe the process I implemented here in broad terms, if you need a clarification please say so and I will edit >that< part.

The steps I took for recognition of shapes (and it works with letters too) are:

  1. Gray Scaling
  2. Binarization (with histograms)
  3. Labeling
  4. Fourier Descriptor representation
  5. Fourier Descriptor Comparison

Once I finished those, the process was that I showed the camera a shape and take notes of its FD representation. Once I took notes of the shapes I wanted to recognize I edited my code (I know , not fancy- sorry) and introduced that as "models" for the step 5. I reckon that an "automatic" method would have been better though. Also take into account that I implemented this all by myself without using OpenCV functions but I think equivalent functions can be found, so if you do, share your knowledge

1. Gray Scaling

Here I simply scan the image pixel by pixel and multiplied the RGB elements to find the Y element. In theory this is: Y=0.2990R+0.587G+0.114B but I used integer equivalent integer constants that were a multiply of these and later I shifted the result. This is because integer calculation is faster.

(At the same time, I built a histogram -since i was scanning the pixels already, - I know it is not good modular design but anyway it works. This histogram is going to be used in the next step

2. Binarization

I did not want to use a binarization in which I had to manually give a threshold so I use the histogram for automatic thresholding. In the case I am commenting I think I used the method called "Mode method" but I am not sure since I work in japanese actually. Anyway, the method is basically take the histogram that was constructed in the previous step and find the two tallest peaks. Then find the lowest valley between them and set the threshold there.

In a different application I have used the "Discriminant analysis Method". It also works well but I have not made comparisons

3. Labeling

to be continued...

More than 2 years after I asked the question, (I didn't know I was looking for something like this on that time) I implemented this. The key is to use border pursuit, and then Fourier descriptors. Now Fourier descriptors are size invariant but not rotation invariant. To make them rotation invariant, I read the excellent book "Feautre Extraction and Image Processing" byNixon and Aguado

However I suspect that on that time (as now too) I was looking for ways to find those patterns when the pattern was inside a bunch of other patterns

Unfortunately - unless the patters are quite separable, the above method won't work.


I am going to describe the process I implemented here in broad terms, if you need a clarification please say so and I will edit >that< part.

The steps I took for recognition of shapes (and it works with letters too) are:

  1. Gray Scaling
  2. Binarization (with histograms)
  3. Labeling
  4. Fourier Descriptor representation
  5. Fourier Descriptor Comparison

Once I finished those, the process was that I showed the camera a shape and take notes of its FD representation. Once I took notes of the shapes I wanted to recognize I edited my code (I know , not fancy- sorry) and introduced that as "models" for the step 5. I reckon that an "automatic" method would have been better though. Also take into account that I implemented this all by myself without using OpenCV functions but I think equivalent functions can be found, so if you do, share your knowledge

1. Gray Scaling

Here I simply scan the image pixel by pixel and multiplied the RGB elements to find the Y element. In theory this is: Y=0.2990R+0.587G+0.114B but I used integer equivalent integer constants that were a multiply of these and later I shifted the result. This is because integer calculation is faster.

(At the same time, I built a histogram -since i was scanning the pixels already, - I know it is not good modular design but anyway it works. This histogram is going to be used in the next step

2. Binarization

I did not want to use a binarization in which I had to manually give a threshold so I use the histogram for automatic thresholding. In the case I am commenting I think I used the method called "Mode method" but I am not sure since I work in japanese actually. Anyway, the method is basically take the histogram that was constructed in the previous step and find the two tallest peaks. Then find the lowest valley between them and set the threshold there.

In a different application I have used the "Discriminant analysis Method". It also works well but I have not made comparisons

3. Labeling

to Labeling involves a little complicated procedure. I am not sure I will be continued...

able to explain it here very well but it basically involves scanning the image pixel by pixel, processing only the white pixels (the blacks are background), then when finding one white pixel, checking the three surrounding upper neighbors and the left neighbor.

If there are no neighbors a new label is created. If there are neighbors compare their labels and if they are the same assign that label to the pixel being processed. if the labels are different then assign the first label and then replace the other label for the first label to all pixels.

In the end there will be some labels with 0 pixels (the ones that got replaced) so you can eliminate them from your list of labels.

4. Fourier Description Representation The theory behind Fourier descriptors is quite ample. There are several types Z-type, G-type and p-type. The difference is what equation you use to get them. In my case it involves the following steps:

4.1 Make Boundary Image

First for each label you have its size so you can separate a buffer. You scan the image until you find a pixel with that label and then start a chase (border following) In my case I used 4-separation , meaning that I moved only down left up and right. In the end you will have a list of coordinates (the border)

4.2 Make Descriptors

Finally we are reaching the main thing of this question. How to make the descriptors (once you have the border coordinates) First you have a defined number of descriptors that you are going to use. In my case I put that at 10 descriptors I used this struct

    struct Descriptor{
double aX[NDESC];
double bX[NDESC];
double aY[NDESC];
double bY[NDESC];
double CE[NDESC];
};

In reality, if you just want to do recognition (and not reconstruction- meaning reconstruct the shape from the descriptors) the only thing you need is the CE array. But... ok I kept the others too for historical reasons(they are the >original < descriptors that you are going to find in many literature about Fourier Descriptors. CE is used to implement rotational invariance and I only found it in the Nixon Aguado book.

I was going to write a long explanation but it will be useless, so I leave you with my code

void CFourierDesc::MakeDescriptors()
{
  int m=boundary0.x.size();
  double t= 2.0* PI/(double)m;
  double ss,cc;

  for(int k=0;k<NDESC;k++)  //for all descriptors
   {
     //initialize the descritptors with 0
     model0.aX[k]=0.0;
     model0.bX[k]=0.0;
     model0.aY[k]=0.0;
     model0.bY[k]=0.0;

    for(int i=0;i<m;i++) //for all points
      {
          cc = cos((k+1) * i * t);
          ss = sin((k+1) * i * t);
          model0.aX[k] += boundary0.x[i] * cc;
          model0.bX[k] += boundary0.x[i] * ss;
          model0.aY[k] += boundary0.y[i] * cc;
          model0.bY[k] += boundary0.y[i] * ss;

      }

     model0.aX[k]*=((double)2/m);
     model0.bX[k]*=((double)2/m);
     model0.aY[k]*=((double)2/m);
     model0.bY[k]*=((double)2/m);

    //Now calculate the rotational invariants

    model0.CE[k]=sqrt((model0.aX[k] * model0.aX[k] +
                       model0.aY[k] * model0.aY[k]) /
                      (model0.aX[0] * model0.aX[0] +  
                       model0.aY[0] * model0.aY[0])) +
                 sqrt((model0.bX[k] * model0.bX[k] +
                       model0.bY[k] * model0.bY[k]) /
                      (model0.bX[0] * model0.bX[0] +  
                       model0.bY[0] * model0.bY[0]));

   }
}

You see, the ax,aY,bX and bY are the original Fourier descriptors that are usually used in all literature. They are as far as I remembered translational and size invariant. But they are not rotational invariants, meaning that two shapes that are the same but just rotated will have different FDs. (on the contrasts the same shaped resized will have the same FDs). So to achieve rotational invariance CE is calculated and >these< are the values that you should use for comparing your shapes with your models.