Ask Your Question

Template Matching Algorithm

asked 2013-01-02 07:07:31 -0600

jkflash gravatar image

updated 2020-11-02 17:43:17 -0600

Hello OpenCV Community. I'm trying to understand how exactly the Template Matching algorithm works. I've read the documentation as well as the explanation in the o'reilly book on page 215ff and have a basic understanding of how the images are matched. However none of theses sources explains in detail why the formulas look like they do. So I'm searching for a mathematical explanation of the CV_TM_CCORR formula and especially the three parts of the CV_TM_CCOEFF formula. I tried to read how the cross correlation stuff works but I'm not a great mathematician and did not fully understand it. Additionally, the formulas in both linked sources seem to have one or two mistakes which does not make it easier to understand. I would very much appreciate if someone could help me with this, as I need it for my current university project.

Thanks in advance, j

Edit: sorry if you see a "maaan" in the title, I added it to force the suggestions-box to close because I couldn't type otherwise and then forgot to delete it again :-/

edit retag flag offensive close merge delete

2 answers

Sort by ยป oldest newest most voted

answered 2013-01-04 10:31:45 -0600

matt.hammer gravatar image

I can explain CV_TM_CORR, but am myself still looking for a good explanation of CV_TM_CCOEFF.

CV_TM_CORR is a cross-correlation, an image/signal processing technique that relies on multiplication. The inputs to a cross-correlation are a smaller sample (with image processing, that's your template image) and a larger target dataset (image). Typically, a cross-correlation is used to find at what position (overlay) does the template/sample most closely match the data in the target image.

Put simply, cross-correlation of a template and an image involves three steps:

  1. Overlay the sample/template onto the target image.
  2. For each pixel position in the overlay, multiply the template image pixel value by the target image pixel value. Sum all the products together to get a "score" for the overlay.
  3. Repeat Step #2 for every possible overlay.

Typically, the overlay position with the highest score is the "winner", especially when using the normalized version of cross-correlation (CV_TM_CCORR_NORMED) - when the positive values line up with positive values and the negative values line up with negative values (which multiply to a positive) and all those positives are summed up, the score peaks, signifying a good alignment. Wikipedia probably does a better job explaining it than I do: Cross Correlation

Looking closer at the OpenCV CV_TM_CCORR equation:

R(x,y) is the cross-correlation score for a single overlay position (x, y).

T(x',y') is the image pixel value for a pixel (x',y') in the template/sample image.

I(x+x',y+y') is the image pixel value for the corresponding (based on the overlay) pixel position in the target image.

We sum up the product of T(x',y') and I(x+x',y+y') for each overlay pixel position - every possible (x', y') in the overlay - to get our score. Then we move to a new overlay (x,y) and repeat to get the other overlay scores.

(Also, please note the typos in the formula contained in the first edition of the O'Reilly book - there are some extraneous powers of two floating around, among other issues. I believe the formulas on the website are correct.)

Now, for CV_TM_CCOEFF.

It's the same basic framework, but with a different underlying calculation for each overlay. I don't understand the CV_TM_CCOEFF calculation. O'Reilly explains that "These methods match a template relative to its mean against the image relative to its mean, so a perfect match will be 1 and a perfect mismatch will be -1; a value of 0 simply means that there is no correlation". However, the equation given for CV_TM_CCOEFF doesn't subtract the mean from each pixel value but instead subtracts the reciprocal of the pixel value sum TIMES the number of pixels (shouldn't it be a division?). Plus, all the simple examples I work out on paper (with small, one dimensional signals) usually don't give me 1, 0, or -1. I also Googled Correlation Coefficient and found variations of this: Pearson Correlation Coefficient, which has all kinds of ... (more)

edit flag offensive delete link more


Thank you very much for your explanation, I understood you better than wikipedia :-P Now I understany CCORR, but as you said, the explanation of CCOEFF is still a mistery, because I don't know if the formulas are correct. I guess the Pearson Correlation Coefficient is the one that is used by OpenCV and I understand the basic stuff. Anyway, thank you very much :-)

jkflash gravatar imagejkflash ( 2013-01-13 10:56:39 -0600 )edit

Let's re-ask about CCOEFF on the main page - see if it gets more attention.

matt.hammer gravatar imagematt.hammer ( 2013-01-14 15:17:40 -0600 )edit

Hello Sir, is there already explanation of CCOEFF_NORMED and maybe explanation of other method?

eflianto gravatar imageeflianto ( 2017-01-09 08:17:20 -0600 )edit

answered 2020-04-21 08:04:54 -0600

Going to write quite a bit so it won't fit in a comment here, so this is directed at Matt's answer. This is also my answer to the original question, still.

Firstly, for CCORR - in normal pictures there are no negative values, so I think your explanation of CCORR isn't complete, at least. Best way I can explain it from the math is, like 68 is smaller than 77 (i.e. the maximum of x(a-x) is at x=a/2) the highest you can get is if the numbers are close. Their product will be higher if they are higher, but that is taken care of with the normalization of CCORR_NORMED - you can see from the examples in https://opencv-python-tutroals.readth... that CCORR is indeed not very useful, and returns bright for bright areas and dark for dark areas as expected.

Even after normalization it wouldn't seem like this would be as effective as cross-correlation (CCORR) over information that also has negative values (since, for example the 68 is exactly smaller by 1 than the 77). So: For CCOEFF - I believe that the formula on opencv's documentation is indeed division; it's supposed to be read as (1/wh)(sum(T(x'', y'')) and not 1/(whsum(T(x'', y''))) and so it is the average or mean of the template (and of the template-sized portion of the image). By substracting it from every pixel you make the darker ones negative and the lighter ones positive, just like we wanted the CCORR to behave - and then you use those values for exactly the same procedure CCORR does. This is already a better answer, which can still be normalized - both CCOEFF and CCOEFF_NORMED work.

I know this is incredibly late, but if anyone stumbles here like I did I hope this can help :P

edit flag offensive delete link more

Question Tools

1 follower


Asked: 2013-01-02 07:07:31 -0600

Seen: 6,973 times

Last updated: Jan 04 '13