Improve Runtime of a Function

Hi, I am using a function to find difference between 2 images. It takes two 8-bit grayscale images, converts them to CV_32FC1, does a subtraction. Here is the function I am using:

cv::Mat calculateDiff(const cv::Mat &image_one, const cv::Mat &image_two)
    cv::Mat im1, im2, im_dest;
    im1 /= 2.f;
    im1 += 128.f;
    im2 /= 2.f;
    cv::subtract(im1, im2, im_dest);
    im_dest.convertTo(im_dest, CV_8UC1);

   return im_dest;

I have measured the run-time of each major step individually

  1. convertTo steps
  2. divide by 2, add 128
  3. subtract() function
  4. convertTo()

When I call this function with images of sizes: 9000 x 6000. I get a run-time of about 900 msec, but each individual step takes a lot less time. Here's one example:

  1. Step 1 time: 64 msec
  2. Step 2 time: 76 msec
  3. Step 3 time: 51 msec
  4. Step 4 time: 22 msec

When I called the function: I get the function's runtime: 905 msec

The function call looks like this:

cv::Mat diff_image;
diff_image = calculate_diff(input_one, input_two);

I measure the runtime using cv::getTickCount() and cv::getTickFrequency()

Why is the function's runtime so large where individual step do no take that long? How to improve the runtime? Kindly Help