# Is their a cuda namespace equivalent of Mat::dot for GpuMat's?

Hello, I am currently translating various sections of a registration algorithm to the cuda:: namespace - make use of our lab titan. The code has a Mat::dot product, for which, I believe is just a standard vector dot product - summation of element-wise multiplication of the two vectors. When I compute a couple of small matrices, the results are the same as calculated by hand. However, there seems to be no GpuMat.dot(), or anything equivalent in the cuda namespace for performing the dot product. Also, the results am getting when I try larger matricies vary from the Mat.dot(), and I am struggling to understand how it is calculated, so I can replicate it with cubla's or the like if need be.

The code in the repo for Mat::dot is

```
double Mat::dot(InputArray _mat) const
{
Mat mat = _mat.getMat();
int cn = channels();
DotProdFunc func = getDotProdFunc(depth());
CV_Assert( mat.type() == type() && mat.size == size && func != 0 );
if( isContinuous() && mat.isContinuous() )
{
size_t len = total()*cn;
if( len == (size_t)(int)len )
return func(data, mat.data, (int)len);
}
const Mat* arrays[] = {this, &mat, 0};
uchar* ptrs[2];
NAryMatIterator it(arrays, ptrs);
int len = (int)(it.size*cn);
double r = 0;
for( size_t i = 0; i < it.nplanes; i++, ++it )
r += func( ptrs[0], ptrs[1], len );
return r;
}
```

But, where and what is `func()`

? I cant locate it in the repo.
The description in the documentation:
"The method computes a dot-product of two matrices. If the matrices are not single-column or single-row vectors, the top-to-bottom left-to-right scan ordering is used to treat them as 1D vectors. The vectors must have the same size and type. If the matrices have more than one channel, the dot products from all the channels are summed together."
Seems to suggest that, if dot is called on two images, so multi-row vectors, the function performs the calculation
A[i] * B[j], so takes a row vector from A, a column vector from B and multiplies to get a Scalar, if so, the result would be a single-row vector, which it them performs a final summation over?

If their is a cuda version that replicates its function, that would be perfect, otherwise, if someone could explain how the computation is performed that would be great!