Ask Your Question

Can cv::dft() be sped up with the right compiler flags?

asked 2018-09-24 06:27:16 -0600

speedymcs gravatar image

For some time I have been using cv::dft() on a large image and it always took about 4-5 seconds. I noticed it now takes about 30 seconds for the same image and I wonder why. I recently recompiled OpenCV, without having saved the original compiler flags, so maybe I am missing a flag now that speeds up the dft function?

This is the current build configuration:

General configuration for OpenCV 3.2.0 =====================================
  Version control:               unknown

  Extra modules:
    Location (extra):            /home/uname/opencv_contrib-3.2.0/modules
    Version control (extra):     unknown

    Timestamp:                   2018-09-17T15:22:43Z
    Host:                        Linux 4.4.0-135-generic x86_64
    CMake:                       3.5.1
    CMake generator:             Unix Makefiles
    CMake build tool:            /usr/bin/make
    Configuration:               RELEASE

    Built as dynamic libs?:      YES
    C++ Compiler:                /usr/bin/c++  (ver 5.4.0)
    C++ flags (Release):         -fsigned-char -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wundef -Winit-self -Wpointer-arith -Wshadow -Wsign-promo -Wno-narrowing -Wno-delete-non-virtual-dtor -Wno-comment -fdiagnostics-show-option -Wno-long-long -pthread -fomit-frame-pointer -ffast-math -msse -msse2 -mno-avx -msse3 -mno-ssse3 -mno-sse4.1 -mno-sse4.2 -ffunction-sections -fvisibility=hidden -fvisibility-inlines-hidden -O3 -DNDEBUG  -DNDEBUG
    C++ flags (Debug):           -fsigned-char -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wundef -Winit-self -Wpointer-arith -Wshadow -Wsign-promo -Wno-narrowing -Wno-delete-non-virtual-dtor -Wno-comment -fdiagnostics-show-option -Wno-long-long -pthread -fomit-frame-pointer -ffast-math -msse -msse2 -mno-avx -msse3 -mno-ssse3 -mno-sse4.1 -mno-sse4.2 -ffunction-sections -fvisibility=hidden -fvisibility-inlines-hidden -g  -O0 -DDEBUG -D_DEBUG
    C Compiler:                  /usr/bin/cc
    C flags (Release):           -fsigned-char -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wmissing-prototypes -Wstrict-prototypes -Wundef -Winit-self -Wpointer-arith -Wshadow -Wno-narrowing -Wno-comment -fdiagnostics-show-option -Wno-long-long -pthread -fomit-frame-pointer -ffast-math -msse -msse2 -mno-avx -msse3 -mno-ssse3 -mno-sse4.1 -mno-sse4.2 -ffunction-sections -fvisibility=hidden -O3 -DNDEBUG  -DNDEBUG
    C flags (Debug):             -fsigned-char -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wmissing-prototypes -Wstrict-prototypes -Wundef -Winit-self -Wpointer-arith -Wshadow -Wno-narrowing -Wno-comment -fdiagnostics-show-option -Wno-long-long -pthread -fomit-frame-pointer -ffast-math -msse -msse2 -mno-avx -msse3 -mno-ssse3 -mno-sse4.1 -mno-sse4.2 -ffunction-sections -fvisibility=hidden -g  -O0 -DDEBUG -D_DEBUG
    Linker flags (Release):
    Linker flags (Debug):
    ccache:                      NO
    Precompiled headers:         YES
    Extra dependencies:          /home/uname/anaconda3/lib/ /home/uname/anaconda3/lib/ /usr/lib/x86_64-linux-gnu/ /home/uname/anaconda3/lib/ gtk-3 gdk-3 pangocairo-1.0 pango-1.0 atk-1.0 cairo-gobject cairo gdk_pixbuf-2.0 gio-2.0 gobject-2.0 glib-2.0 gthread-2.0 avcodec-ffmpeg avformat-ffmpeg avutil-ffmpeg swscale-ffmpeg /home/uname/anaconda3/lib/ /home/uname/anaconda3/lib/ /usr/lib/x86_64-linux-gnu/ /usr/lib/x86_64-linux-gnu/ /home/uname/anaconda3/lib/ /usr/lib/x86_64-linux-gnu/ /usr/lib/x86_64-linux-gnu/ dl m pthread rt cudart nppc nppial nppicc nppicom nppidei nppif nppig nppim nppist nppisu nppitc npps cublas cufft -L/usr/local/cuda/lib64
    3rdparty dependencies:       libwebp IlmImf libprotobuf

  OpenCV modules:
    To be built:                 cudev core cudaarithm flann hdf imgproc ml reg surface_matching video cudabgsegm cudafilters cudaimgproc cudawarping dnn freetype fuzzy imgcodecs photo shape videoio cudacodec highgui objdetect plot ts xobjdetect xphoto bgsegm bioinspired dpm face features2d line_descriptor saliency text calib3d ccalib cudafeatures2d cudalegacy cudaobjdetect ...
edit retag flag offensive close merge delete


can you try with a more recent 3.4 branch, not 3.2.0 ?

(this would add additional CPU dispatch options and more)

berak gravatar imageberak ( 2018-09-24 06:54:28 -0600 )edit

I guess (would take some more time and risk than recompiling my current version), but I had been using OpenCV 3.2.0 all the time and it jumped from 5 to 30 seconds after recompiling...

speedymcs gravatar imagespeedymcs ( 2018-09-24 06:59:19 -0600 )edit

1 answer

Sort by ยป oldest newest most voted

answered 2018-09-27 12:17:07 -0600

speedymcs gravatar image

updated 2018-09-27 13:14:20 -0600

It looks like WITH_IPP=ON did the trick.

As a side note, padding can make a huge difference as well, as explained here:

edit flag offensive delete link more


thanks for coming back with that !

berak gravatar imageberak ( 2018-09-27 12:20:01 -0600 )edit

Question Tools

1 follower


Asked: 2018-09-24 06:27:16 -0600

Seen: 249 times

Last updated: Sep 27 '18