I notice three things: * For every grabbed screen you read the template from disk - slow, just read it once into memory * Doing the operations for a full size window is very slow - resize the grabbed screen to smallest size that still works * You seem to convert the screen from BGR to RGB, but a Windows screen would be RGB