Ask Your Question
0

DNN module: switch Torch model between train() and evaluation() modes

asked 2018-12-20 02:42:59 -0600

Jaewoo gravatar image

updated 2018-12-30 20:55:21 -0600

Torch models can run in two different modes: train() and evaluation(). Results of some layers such as batch normalization will be affected by the modes.

Currently it seems (i.e., when I tested) that OpenCV's DNN module loads and runs torch models in train() mode. Is there a way to use torch models in evaluation() mode using OpenCV?

(ADDED) Below is the lua code for running the model with two different modes.

require 'torch'
require 'image'
require 'nn'

local net = torch.load('model.t7')
net:evaluate() -- you can turn this on or off

local image_input = image.load('input_image.jpg', 3, 'float')
image_input:resize(1, image_input:size()[1], image_input:size()[2], image_input:size()[3])

local result = net:forward(image_input)
image.save('result.jpg', result[1])
edit retag flag offensive close merge delete

Comments

You need to call evaluation() before Torch's net serialization.

dkurt gravatar imagedkurt ( 2018-12-20 02:54:56 -0600 )edit

@dkurt Would you explain to me a bit in detail? I tried calling model:evaluation() and then using torch.save() to save the model file. But OpenCV DNN is still using the train mode.

Jaewoo gravatar imageJaewoo ( 2018-12-20 02:59:01 -0600 )edit

@Jaewoo, what do you mean by OpenCV DNN is still using the train mode? How do you test it?

dkurt gravatar imagedkurt ( 2018-12-20 03:04:36 -0600 )edit

@dkurt I have a deconvolution network. I ran it on my desktop with same input, once with model:train() and the other time with model:evaluation(). Then I got two different results. After that I ran the model with same input on Android using OpenCV for Android. Then the result from the phone was same with the result from the desktop with model:train() mode.

Jaewoo gravatar imageJaewoo ( 2018-12-20 03:19:16 -0600 )edit

@Jaewoo, At first check that both inputs for train and test phase are similar. Please also check that you use correct method to switch the phase (in OpenCV's scripts there is net:evaluate() but not net:evaluation()).

Please also provide a standalone lua script which runs your model imported from .t7 (or `.net).

BTW, deconvolution's layer forward pass doesn't depend on learning phase.

dkurt gravatar imagedkurt ( 2018-12-20 03:24:00 -0600 )edit

@dkurt Oh, I used net:evaluate(), not net:evaluation(), in torch. Sorry for the typo. I am sure the inputs were exactly same because I used one image file for every run. May I ask what do you mean by OpenCV has net:evaluate()? Is there evaluate()function which is not listed in https://docs.opencv.org/3.4.4/db/d30/... ? (Added) The difference of the result comes from batch normalization which are inserted between deconvolution layers.

Jaewoo gravatar imageJaewoo ( 2018-12-20 03:27:40 -0600 )edit

@Jaewoo, I meant just script to generate test data for OpenCV's Torch importer: https://github.com/opencv/opencv_extr....

dkurt gravatar imagedkurt ( 2018-12-20 03:31:51 -0600 )edit

@Jaewoo, if you're able to share the model - it'd be the simplest way to reproduce your problem.

dkurt gravatar imagedkurt ( 2018-12-20 03:34:09 -0600 )edit

@dkurt The model is on https://github.com/chuanli11/MGANs/bl... . (ADDED) I added the my lua code for running train() and evaluate(). The purpose of the code was to confirm the difference between two modes.

Jaewoo gravatar imageJaewoo ( 2018-12-20 03:35:46 -0600 )edit

1 answer

Sort by ยป oldest newest most voted
1

answered 2018-12-20 03:53:03 -0600

dkurt gravatar image

updated 2018-12-20 23:44:43 -0600

@Jaewoo, please try the following experiment:

1. Import the model in Torch, generate input, make forward pass and save reference output:

require 'nn'

torch.setdefaulttensortype('torch.FloatTensor')

net = torch.load('Picasso.t7'):float()

net:evaluate()

input = torch.rand(1, 3, 100, 100)
output = net:forward(input)

torch.save('input.t7', input, 'binary')
torch.save('output.t7', output, 'binary')

2. Load the same model using OpenCV, set input blob and get output one. Compare it with reference blob:

import cv2 as cv
import numpy as np

net = cv.dnn.readNet('Picasso.t7')
net.setPreferableBackend(cv.dnn.DNN_BACKEND_OPENCV)

inp = cv.dnn.readTorchBlob('input.t7')
ref = cv.dnn.readTorchBlob('output.t7')

net.setInput(inp)
out = net.forward()

print np.max(np.abs(out - ref))

I received 2.08616e-06 maximal absolute difference that means very similar output.


UPDATE

@Jaewoo, Oh thank you, I got it! Yes, you're definitely right. Usually style transfer models are based on batch normalization layers which work in train mode that means they normalize input data using it's mean and div. In example, https://github.com/jcjohnson/fast-neu... introduces InstanceNormalization. If you able to recompile OpenCV, you may enable your model modifying https://github.com/opencv/opencv/blob...:

replace

if (nnName == "InstanceNormalization")

to

if (nnName == "InstanceNormalization" || (scalarParams.has("train") && scalarParams.get<bool>("train")))

I got about 2.63751e-05 difference between Torch and OpenCV outputs using scripts above without adding net:evaluate().

edit flag offensive delete link more

Comments

Thanks for your answer! I also got same result by using OpenCV for Python. May I ask one more question? Then is there a way to run the model in :train() mode by using OpenCV?

Jaewoo gravatar imageJaewoo ( 2018-12-20 20:56:02 -0600 )edit

@Jaewoo, do you really need this? I mean some layers aren't deterministic in training phase (i.e. Dropout). That means for the same input it can produce different outputs.

dkurt gravatar imagedkurt ( 2018-12-20 22:28:15 -0600 )edit

@dkurt I know that some layers, such as dropout or batch normalization are not deterministic during the training phase. But I asked the question because for Picasso.t7 model above, it seems the :train() mode generates better outputs. Also I don't need accurate outputs because the model is about style transfer. If the outputs are generally similar, then it is fine to me.

Jaewoo gravatar imageJaewoo ( 2018-12-20 23:21:17 -0600 )edit

@Jaewoo, please try the fix mentioned in updated answer. It'd be nice if you can attach some resulting image received from OpenCV to show that it really works as you expected.

dkurt gravatar imagedkurt ( 2018-12-20 23:46:44 -0600 )edit
1

@dkurt If I could, I would give you 1000000 upvotes. Thank you so much for your kind feedback!

Jaewoo gravatar imageJaewoo ( 2018-12-20 23:52:08 -0600 )edit
1
dkurt gravatar imagedkurt ( 2018-12-21 00:14:16 -0600 )edit

@dkurt Thanks for the pull request!

Jaewoo gravatar imageJaewoo ( 2018-12-21 00:59:10 -0600 )edit

@Jaewoo, unfortunately some of our tests are failed with this simple solution. To preserve compatibility we proposed an extra flag to readNetFromTorch. Please check the changes from PR again. For your model you need to call net = readNetFromTorch('/path/to/model', True, False).

dkurt gravatar imagedkurt ( 2018-12-21 05:49:39 -0600 )edit

Question Tools

1 follower

Stats

Asked: 2018-12-20 02:42:59 -0600

Seen: 755 times

Last updated: Dec 30 '18