You might simply try to implement it yourself, without opencv functions:
Mat image, result;
for(y=1;y<image.rows-1;y++)
for(x=1;x<image.cols-1;x++)
if((image.at<uchar>(x,y)>image.at<uchar>(x-1,y-1))&&(image.at<uchar>(x,y)>image.at<uchar>(x,y-1))&&(image.at<uchar>(x,y)>image.at<uchar>(x+1,y-1))&&(image.at<uchar>(x,y)>image.at<uchar>(x+1,y))&&(image.at<uchar>(x,y)>image.at<uchar>(x+1,y+1))&&(image.at<uchar>(x,y)>image.at<uchar>(x,y+1))&&(image.at<uchar>(x,y)>image.at<uchar>(x-1,y+1))&&(image.at<uchar>(x,y)>image.at<uchar>(x-1,y)))mask.at<uchar>(x,y)=1;
else mask.at<uchar>(x,y)=0;
Sorry for the C code, it's untested and not optimized, but it's easy to understand: you take every pixel (x,y) (except at the border), and you compare it to the 8 neighbors (x-1,y-1 ; x,y-1 ; x+1,y-1 ; ...). If it's bigger, than it's a local maximum.
You can adapt and optimize the code according to your needs. With a little work you can process the borders, too.
You may need to see minMaxLoc()
I am looking for something local, not global.
You can set ROI of your kernel size and perform minMaxLoc()
That sound highly unefficient. A dumb way would be to test explicitely all the neighbor pixels. I'll try that since I don't have a better idea.