Revision history [back]

Howdy, from the other end of the spectrum; early 20s. It is always exciting to have an elder still excited in CV!

OpenCV can do what you are asking for in its entirety actually. The skeletal view of your problem can be broken into four sections.

1. Image Preprocessing

You would want to perform some sort of thresholding, edge detection, morphological operations, contour extraction, etc in order to remove any form of noise and also help you isolate the targets in your game. C++ tutorial Python tutorial

2. Event Detection

OpenCV offers mouse callbacks which would give you the respective (x, y). C++ tutorial Python tutorial. I recommend the Python tutorial link here for reference then convert the logic to your desired language.

3. Object Detection

After stage 1, you hopefully have a feed consisting of a majority of your targets and unnecessary noise removed. But since you can't guarantee having a noiseless image, its time to detect your object in this less noisy image.

If your object is going to be similar throughout, you can employ template matching keypoint matching. Here's a full list with some info.

If your objects have certain polygonal features, you can tweak attributes such as circularity, area, perimeter etc from the blob extractor in an attempt to detect your object. Or you can use one of the following features detectors to achieve the task as well.

Depending on your object, you can go search online for suggestions on how to detect it.

4. Object Tracking

Hopefully at this point we have achieved a good threshold of detection, it is time to start tracking the object. This is where the tracking API comes into play. This blog does a good job of explaining them.

Disclaimer

I added the Object Detection and Tracking sections because I am not sure of your game design.

Scenario I

If your game is event driven i.e. depending on where the user points the crosshair do something, then you do not really need any of them. Depending on the bounding (x, y) you can just remove whatever object is around that area. No detection or tracking required.

Scenario II

If the user's input (x, y) would require you to track the object, then no need for detection. Simply extract features around (x, y) and track them.

Other scenarios

Maybe you need detection and tracking, or just detection, etc. Point is, you do not necessarily need to utilize both detection and tracking or even either.

Two Cents

I would recommend you mess around with the skeletal design in an attempt to design a good foundational system before jumping to the masking and creating other layers; you probably already know this :)

OpenCV offers sample code both in C++ and Python illustrating how to do a majority of the stuff I stated above. Enjoy this video showing an application that covers a huge chunk of your game's core.

Cheers!