Cannot use Tensorflow model with batch normalization [closed]

asked 2019-08-20 03:31:16 -0500

Nikolai Tasev gravatar image

updated 2019-08-20 03:54:27 -0500

berak gravatar image

I have a simple convolution network model made with Keras and Tensorflow 1.14 The model is saved as constant graph in binary .pb format

The model loads successfully but the calculations are not correct after the first batch norm layer

I am using OpenCV 3.4

Anyone encountered or heard a similar problem?

edit retag flag offensive reopen merge delete

Closed for the following reason the question is answered, right answer was accepted by Nikolai Tasev
close date 2019-08-25 03:43:40.211350

Comments

If you really want to go for deep learning in OpenCV, i so suggest using latest master 4.x branch. There are like daily fixes on these things, so 3.4 will probably be heavily outdated...

StevenPuttemans gravatar imageStevenPuttemans ( 2019-08-20 05:00:20 -0500 )edit
1

i'm getting similar problems with pytorch->onnx->dnn with 4.1.0.

a simple conv/bn/relu/pool is highly inaccurate with a bn in it, and ok with bn removed.

https://gist.github.com/berak/43ad415...

solved my problem:

model.eval() needs to be called before saving the onnx, to put it from "train" ito "evaluation" mode, similar to "freezing" a tf network.

berak gravatar imageberak ( 2019-08-20 06:48:26 -0500 )edit

I also have bn as a second layer after conv. The outputs are quite different in OpenCV comparing to Tensorflow. There was something strange. I saw the Keras bn layer is done by several nodes in Tensorflow but in OpenCV I see only one layer named fused_batchnorm.

Nikolai Tasev gravatar imageNikolai Tasev ( 2019-08-20 07:02:10 -0500 )edit
1

Feel free to open an issue providing steps to reproduce it (attach the model). We observed several times buggy Keras batch normalization - it does not switch between training nd testing mode properly. So if the latest master or the latest 3.4 branches produce wrong results - let's investigate if together without woodoo debugging but with reproducible reports. Thanks!

dkurt gravatar imagedkurt ( 2019-08-20 07:09:14 -0500 )edit
1

Will do. I will have to verify which version I am using and gather the relevant information.

Nikolai Tasev gravatar imageNikolai Tasev ( 2019-08-20 07:39:24 -0500 )edit
1

I think there is already an issue on the topic here. Have you frozen the graph_def file like described in the comment?

paubau gravatar imagepaubau ( 2019-08-20 08:34:57 -0500 )edit

Yes I used tf.keras.backend.set_learning_phase(0) before loading the model from the keras saved file then used tf.graph_util.convert_variables_to_constants(...)

Nikolai Tasev gravatar imageNikolai Tasev ( 2019-08-21 06:01:59 -0500 )edit

Added the first part of the model (up to the batchnorm) and some test data in the issue https://github.com/opencv/opencv/issu...

Nikolai Tasev gravatar imageNikolai Tasev ( 2019-08-21 06:03:09 -0500 )edit
2

I found the problem with the help of dkurt. Seems the BatchNorm was configured wrongly for channels first data format instead of channels last.

Nikolai Tasev gravatar imageNikolai Tasev ( 2019-08-25 03:41:18 -0500 )edit