2x slower performance in C++ vs. Python for detectAndCompute with ORB features
Hi, I'm trying to write code for finding regions of an image with a moving object. Generally, my algorithm is:
- Find keypoints in two successive frames
- Match them
- Use RANSAC method to find a homography
- Find outliers
I had this written in Python as a first shot on my own. However, it is a requirement on the final board for it to be in C++. So, I started transferring it over so we could test it on our processor. For a simple benchmark, I got an FPS across 400 frames of a large-ish resolution video. When I compared the two, C++ (only detectAndCompute) was coming in, at best, around 0.07 sec/frame or 14 FPS. However, Python, using detectAndCompute, was coming in at about 0.041 seconds per frame or 24 FPS.
I read that the Python bindings just generate a wrapper from the headers, so I figure that if I could know how it is calling the function, I could speed up my C++ code. Are there any suggestions on how to find that or others things that I am doing incorrectly that could be causing this?
Using OpenCV 3.3 on Ubuntu 16.04. Compiled with g++, and Python is 2.7.
Python:
import cv2
import numpy as np
import math
from matplotlib import pyplot as plt
from time import time
import argparse
parser = argparse.ArgumentParser()
parser.add_argument('path', help='path to input video')
parser.add_argument('-s', '--save', default=0, help='save a video?')
args = parser.parse_args()
vid_path = args.path
orb = cv2.ORB_create()
img2 = cv2.imread(vid_path + '/00000001.jpg')
h, w, c = img2.shape
s = time()
for i in range(2,401):
img1 = img2.copy()
img2 = cv2.imread(vid_path + '/' + format(i, '08d') + '.jpg')
kp1, des1 = orb.detectAndCompute(img1,None)
kp2, des2 = orb.detectAndCompute(img2,None)
t2 = time()
print('Frame Time: \t' + str((t2-s) / 399))
cv2.destroyAllWindows()
C++:
#include <opencv2/videoio.hpp>
#include <opencv2/opencv.hpp>
#include <opencv2/highgui.hpp>
#include <opencv2/features2d.hpp>
#include <iostream>
#include <string>
#include <vector>
#include <time.h>
using namespace cv;
int main(int argc, char const *argv[]) {
if(argc != 5){
std::cout << "Usage: ./orb [video_path] [distance_threshold] [max_iterations] [n_bins]" << std::endl;
return -1;
}
// ----- parse command line arguments -----
std::string vid_path = argv[1];
double DIST_THRESH = atof(argv[2]);
int MAX_ITER = atoi(argv[3]);
int N_BINS = atoi(argv[4]);
Ptr<ORB> orb = ORB::create();
Ptr<DescriptorMatcher> matcher = DescriptorMatcher::create("BruteForce-Hamming");
Mat img2 = imread(vid_path + "/00000001.jpg");
Mat img1;
int h = img2.rows;
int w = img2.cols;
clock_t start = clock();
for (int i=2; i<401; i++) {
img1 = img2.clone();
std::ostringstream formatted_number;
formatted_number << std::setw(8) << std::setfill('0') << i;
img2 = imread(vid_path + "/" + formatted_number.str() + ".jpg");
std::vector<KeyPoint> kp1, kp2;
Mat des1, des2;
std::vector<DMatch> matches;
orb->detectAndCompute(img1, noArray(), kp1, des1);
orb->detectAndCompute(img2, noArray(), kp2, des2);
// matcher->match(des1, des2, matches);
}
clock_t end = clock();
double f_time = ((double)(end - start)) / CLOCKS_PER_SEC / 399;
std::cout << "Frame Time: \t" << f_time << " seconds" << std::endl;
}
time.time() measures "wall time", clock() measures "cpu" time.
try cv2.getTickCount() / cv2.getTickFrequency() for a fair comparison.
last, you should not include the imread() parts in your timing