I have a key-point based visual odometry routine which accepts as input an RGB-D frame. Successive image are tracked to each other and a cumulative rotation and translation is maintained. In this current form, significant drift occurs. I intend to transition this routine to make use of key-frames, whereby until some sufficient displacement has occurred to necessitate a new key-frame, new RGB-D frames are tracked to the most recent key-frame. Key-frames should significantly reduce drift and are useful for further processing if so desired (m-frame bundle adjustment, etc.).
My question is pretty fundamental. Assume I have performed tracking (key-point matching and PNP) and have [R|t] for the current frame to the current key-frame. Now, given a key-point pair, one in the key-frame and one in the current frame, each with 3D position and uncertainty/covariance, how can I fuse the new data into the key-frame data? Of course, there are many papers that dance around this and take it for granted, but for someone new to this sort of thing, I am having trouble finding a source that offers a good explanation (this might even come from radar systems).