Hello. I'm working on an application that will eventually take video footage of a player playing a video game as an input and output the location of the player on the map.

Using a free roam mode in the video game, I can walk around the map and display the camera's current position and orientation (X, Y, Z coordinates + camera angle). Using this mode, I want to build some sort of map of my own of the in-game map.

Then, I want to take separate footage of a player playing the game and compare the footage against the map I've built to determine where the player is.

I have basic familiarity with the concepts of CV, SFM, SLAM, etc. I don't have much practical experience in these fields, however, and I'm kind of overwhelmed by all the variations and specific use-cases. What I'm hoping for in an answer is some guidance in how I should approach the project. My first step is very clear to me: parse the reference footage + coordinates/orientation on screen using OCR or template matching or some technique to extract the coordinates/orientation. But how should I proceed from there? What approach should I take with creating the reference map data considering the end goal is to perform localization later on using footage from gameplay compared against the reference map?

Any help/guidance is greatly appreciated. Please let me know if I haven't made my project/goals clear, or if there are other questions that have useful answer. Thanks!

your goals are clear, unfortunately, it's also clear, that you're lacking nessecary skills for this, starting with proper academic research.

this is an ongoing active research topic, and the work required here equivalent to a phd.. don't expect fo find anything readymade.

please come back, IF you have an opencv specific question.

