I believe, from your posts that you are trying to implement a sort of SLAM solution. This is not an easy task, and I recommend you to read the works of LSD-SLAM (semi-dense approach), and PTAMM (feature-based approach) to have an idea of two very robust (albeit distinct) solutions.
Regarding this particular issue:
Determining the camera pose by feature-matching is extremely jittery. A small amount of camera noise can make a calibration feature "disappear" and affect the solvePNP. This is even aggravated if you are using RANSAC. Keep in mind that RANSAC is non-deterministic algorithm. It will find the solution with least error from a a subset of your input. This means that if a small group of features with a certain error provides a solution with small error (but wrong because their positions are contaminated with noise), your camera pose will "jump".
I fast solution that come to my mind is perform some feature matching and then pass the estimated position of the features as initial guess of an optical flow algorithm. Note that optical flow algorithms are not robust to occlusion, so you will need to make a further check if the tracker feature matches with the original one. Then you can use the input to solvePNP. A better yet solution would use a weighted motion estimator instead solvePNP, so features with low confidence would affect the camera calibration less than properly matched points, but keeping all the information into account. This isn`t implemented in OpenCV, but there is a complete description in a the thesis "Visual Tracking for Augmented Reality" from Georg Klein (Univ. of Cambridge), about how to implement one.
I hope that this helps.