Is the swapRB value in the example GoogLeNet DNN code wrong?

asked 2017-09-12 20:06:41 -0500

jrobble gravatar image

Hi All, I'm confused. The "Load Caffe framework models" example code for OpenCV 3.3 reads:

//GoogLeNet accepts only 224x224 RGB-images
Mat inputBlob = blobFromImage(img, 1, Size(224, 224), Scalar(104, 117, 123));   //Convert Mat to batch of images

The last parameter swapRB isn't provided, so the default value of true is used. My understanding is that OpenCV imread() and video capture read data in as BGR, so if that comment is to be believed, then the code is doing the right thing by converting the image data to RGB. However, I can't find any evidence that the GoogLeNet model actually accepts RGB images.

We've found several places where the BGR color scheme for Caffe is explicitly called out, including this one.

This example on the BVLC github website shows how they preprocess the data, when the image is read with OpenCV. We're not seeing where they performed the swap. It reads:

/* This operation will write the separate BGR planes directly to the
  * input layer of the network because it is wrapped by the cv::Mat
  * objects in input_channels. */

Any and all help would be greatly appreciated!

Furthermore, in the OpenCV 3.1 example code in Step 4 it reads:

resize(img, img, Size(224, 224));       //GoogLeNet accepts only 224x224 RGB-images
dnn::Blob inputBlob = dnn::Blob(img);   //Convert Mat to dnn::Blob image batch

Firstly, we resize the image and change its channel sequence order.

I'm not sure what actually changes the channel sequence order there.

edit retag flag offensive close merge delete


@jrobble, Thank you very much for the valuable note! It seems to me that image in tutorial is invariant to RB->BR swapping because a space shuttle is almost white. So the both ways give the same class label and quite similar probabilities: 0.999828 / 0.999935. I tried to find some image for which channels order is more critical. Unfortunately it's hard to find classes differ just in color (like blue ball, red ball) for observe classification mistake. However for image with red wine classifier gives 966-th class id (red wine) with 0.869604 confidence in case of RGB image but the same class id with 0.995303 confidence for BGR image. So it looks like GoogLeNet was truly trained on BGR images. What do you think?

dkurt gravatar imagedkurt ( 2017-09-13 03:03:04 -0500 )edit

@dkurt, we're on the same page. We also thought that using the white space shuttle test image might not reveal BGR vs. RGB colorspace issues. We also tested some images for which classification may be more based on color, such as oranges, and found that setting swapRB to false produced better classifications.

Additionally, at random we chose to run some images of volcanos through GoogLeNet and determined that setting swapRB to false worked better in all of those tests.

jrobble gravatar imagejrobble ( 2017-09-13 15:05:41 -0500 )edit

@jrobble, would you like to prepare a PR?

dkurt gravatar imagedkurt ( 2017-09-13 23:00:11 -0500 )edit

I recently reached out to Vitaliy Lyudvichenko via email. He made some recent commits to that code and is going to review it. I sent him a link to this question. Let's give him some time to review and then decide how to move forward from there.

jrobble gravatar imagejrobble ( 2017-09-14 08:19:33 -0500 )edit

Yes, you are right. As you mentioned all Caffe models should use BGR channel sequence for input images Current example (v. 3.3) uses blobFromImage() function to convert image in planar fromat (i.e. blob) with swapRB=true parameter, so actually RGB images are processed.

BTW The example worked fine in v3.2 when Blob::fromImagefunction was used: . In v3.3 after many refactorings the behaviour became incorrect: .

Could you send a PR with the fix, please?

vludv gravatar imagevludv ( 2017-09-16 17:07:10 -0500 )edit

@vludv, thanks for looking into it. I will submit a PR on Monday.

jrobble gravatar imagejrobble ( 2017-09-16 19:18:38 -0500 )edit

@jrobble, it'll be also great to find all the entries of blobFromImageare used for Caffe samples and check it. Corresponding issue was created:

dkurt gravatar imagedkurt ( 2017-09-17 05:01:29 -0500 )edit

@dkurt, I created two PRs for your issue. I updated all instances of blobFromImage and blobFromImages for the caffe framework sample code and gtests. I didn't modify the instances for the torch framework.

I had to regenerate the numpy files for the gtests. Is there a way to tell the BuildBot that it should build using one PR from the opencv repo and another PR from opencv_extra? I expect my build to fail if it doesn't use both.

I noticed that the halide caffe tests had swapRB set to false. Also, one of the Python caffe files had it set to false too. I did the rest.

I noticed that you removed the third party code used to generate numpy files, specifically, saveBlobtoNPY(). Are there plans to add that functionality back in? I had to re-enable that code in my local branch.

jrobble gravatar imagejrobble ( 2017-09-19 01:36:02 -0500 )edit

@jrobble, thanks a lot! As far as I know, BuildBot tries to merge branches with the same name. In example my_branch_name into opencv/opencv and my_branch_name at opencv/opencv_extra. Also you may try to add **Merge with extra**: opencv_extra_PR_url into PR's description at opencv/opencv. It helps merge them together when PR will be accepted.

We've removed it because it's simple to do the same in python:, numpy.load.

dkurt gravatar imagedkurt ( 2017-09-19 03:31:38 -0500 )edit

@dkurt, you're welcome. Thanks for the info. Let's land the PRs through GitHub. Then I'll answer this question.

jrobble gravatar imagejrobble ( 2017-09-19 23:07:14 -0500 )edit