DNN module: switch Torch model between train() and evaluation() modes

asked 2018-12-20 02:42:59 -0600

Jaewoo
6 ●1 ●3

updated 2018-12-30 20:55:21 -0600

Torch models can run in two different modes: train() and evaluation(). Results of some layers such as batch normalization will be affected by the modes.

Currently it seems (i.e., when I tested) that OpenCV's DNN module loads and runs torch models in train() mode. Is there a way to use torch models in evaluation() mode using OpenCV?

(ADDED) Below is the lua code for running the model with two different modes.

require 'torch'
require 'image'
require 'nn'

local net = torch.load('model.t7')
net:evaluate() -- you can turn this on or off

local image_input = image.load('input_image.jpg', 3, 'float')
image_input:resize(1, image_input:size()[1], image_input:size()[2], image_input:size()[3])

local result = net:forward(image_input)
image.save('result.jpg', result[1])

Comments

You need to call evaluation() before Torch's net serialization.

dkurt ( 2018-12-20 02:54:56 -0600 )edit

@dkurt Would you explain to me a bit in detail? I tried calling model:evaluation() and then using torch.save() to save the model file. But OpenCV DNN is still using the train mode.

Jaewoo ( 2018-12-20 02:59:01 -0600 )edit

@Jaewoo, what do you mean by OpenCV DNN is still using the train mode? How do you test it?

dkurt ( 2018-12-20 03:04:36 -0600 )edit

@dkurt I have a deconvolution network. I ran it on my desktop with same input, once with model:train() and the other time with model:evaluation(). Then I got two different results. After that I ran the model with same input on Android using OpenCV for Android. Then the result from the phone was same with the result from the desktop with model:train() mode.

Jaewoo ( 2018-12-20 03:19:16 -0600 )edit

@Jaewoo, At first check that both inputs for train and test phase are similar. Please also check that you use correct method to switch the phase (in OpenCV's scripts there is net:evaluate() but not net:evaluation()).

Please also provide a standalone lua script which runs your model imported from .t7 (or `.net).

BTW, deconvolution's layer forward pass doesn't depend on learning phase.

dkurt ( 2018-12-20 03:24:00 -0600 )edit

@dkurt Oh, I used net:evaluate(), not net:evaluation(), in torch. Sorry for the typo. I am sure the inputs were exactly same because I used one image file for every run. May I ask what do you mean by OpenCV has net:evaluate()? Is there evaluate()function which is not listed in https://docs.opencv.org/3.4.4/db/d30/... ? (Added) The difference of the result comes from batch normalization which are inserted between deconvolution layers.

Jaewoo ( 2018-12-20 03:27:40 -0600 )edit

@Jaewoo, I meant just script to generate test data for OpenCV's Torch importer: https://github.com/opencv/opencv_extr....

dkurt ( 2018-12-20 03:31:51 -0600 )edit

@Jaewoo, if you're able to share the model - it'd be the simplest way to reproduce your problem.

dkurt ( 2018-12-20 03:34:09 -0600 )edit

@dkurt The model is on https://github.com/chuanli11/MGANs/bl... . (ADDED) I added the my lua code for running train() and evaluate(). The purpose of the code was to confirm the difference between two modes.

Jaewoo ( 2018-12-20 03:35:46 -0600 )edit

add a comment

answered 2018-12-20 03:53:03 -0600

dkurt

1424 ●7 ●17

updated 2018-12-20 23:44:43 -0600

@Jaewoo, please try the following experiment:

1. Import the model in Torch, generate input, make forward pass and save reference output:

require 'nn'

torch.setdefaulttensortype('torch.FloatTensor')

net = torch.load('Picasso.t7'):float()

net:evaluate()

input = torch.rand(1, 3, 100, 100)
output = net:forward(input)

torch.save('input.t7', input, 'binary')
torch.save('output.t7', output, 'binary')

2. Load the same model using OpenCV, set input blob and get output one. Compare it with reference blob:

import cv2 as cv
import numpy as np

net = cv.dnn.readNet('Picasso.t7')
net.setPreferableBackend(cv.dnn.DNN_BACKEND_OPENCV)

inp = cv.dnn.readTorchBlob('input.t7')
ref = cv.dnn.readTorchBlob('output.t7')

net.setInput(inp)
out = net.forward()

print np.max(np.abs(out - ref))

I received 2.08616e-06 maximal absolute difference that means very similar output.

UPDATE

@Jaewoo, Oh thank you, I got it! Yes, you're definitely right. Usually style transfer models are based on batch normalization layers which work in train mode that means they normalize input data using it's mean and div. In example, https://github.com/jcjohnson/fast-neu... introduces InstanceNormalization. If you able to recompile OpenCV, you may enable your model modifying https://github.com/opencv/opencv/blob...:

replace

if (nnName == "InstanceNormalization")

if (nnName == "InstanceNormalization" || (scalarParams.has("train") && scalarParams.get<bool>("train")))

I got about 2.63751e-05 difference between Torch and OpenCV outputs using scripts above without adding net:evaluate().

edit flag offensive delete link

Comments

Thanks for your answer! I also got same result by using OpenCV for Python. May I ask one more question? Then is there a way to run the model in :train() mode by using OpenCV?

Jaewoo ( 2018-12-20 20:56:02 -0600 )edit

@Jaewoo, do you really need this? I mean some layers aren't deterministic in training phase (i.e. Dropout). That means for the same input it can produce different outputs.

dkurt ( 2018-12-20 22:28:15 -0600 )edit

@dkurt I know that some layers, such as dropout or batch normalization are not deterministic during the training phase. But I asked the question because for Picasso.t7 model above, it seems the :train() mode generates better outputs. Also I don't need accurate outputs because the model is about style transfer. If the outputs are generally similar, then it is fine to me.

Jaewoo ( 2018-12-20 23:21:17 -0600 )edit

@Jaewoo, please try the fix mentioned in updated answer. It'd be nice if you can attach some resulting image received from OpenCV to show that it really works as you expected.

dkurt ( 2018-12-20 23:46:44 -0600 )edit

@dkurt If I could, I would give you 1000000 upvotes. Thank you so much for your kind feedback!

Jaewoo ( 2018-12-20 23:52:08 -0600 )edit

@Jaewoo, this PR will fix it: https://github.com/opencv/opencv/pull...

dkurt ( 2018-12-21 00:14:16 -0600 )edit

@dkurt Thanks for the pull request!

Jaewoo ( 2018-12-21 00:59:10 -0600 )edit

@Jaewoo, unfortunately some of our tests are failed with this simple solution. To preserve compatibility we proposed an extra flag to readNetFromTorch. Please check the changes from PR again. For your model you need to call net = readNetFromTorch('/path/to/model', True, False).

dkurt ( 2018-12-21 05:49:39 -0600 )edit

add a comment

DNN module: switch Torch model between train() and evaluation() modes

Comments

1 answer

Comments

Links

Question Tools

Stats

Related questions

DNN module: switch Torch model between train() and evaluation() modes edit

Comments

1 answer

Comments

Links

Question Tools

Stats

Related questions

DNN module: switch Torch model between train() and evaluation() modes