Revision history [back]

I don't know the answer to all of your questions, but here to two of your questions:

CV_RANSAC works as follows: 1. it chooses a subset of your points, 2. it computes the homography matrix (perspective transformation) 3. it divides the points in inliers and outliers (by using the ransacReprojecThreshold, which is the maximum distance between the true point-correspondence and the estimated point-correspondence). This procedure is repeated several times and the homography matrix which satisfies a specific confidence (findHomography has set it to 0.995) or after several iterations (findHomography uses 2000 iterations) is returned.
You are right, the typical keypoints won't work for binary images, however you could go the other way around and try to extract the shapes of your RGB image and then match both shapes using a shape-matching algorithm. If you have a static scene and want to detect moving objects like cars and humans, you could make a background subtraction and then edge-detection (Canny), findContours --> shapes.