To my knowledge, filter2d can run a lot faster under parallel computation. But all I can see is that OpenCV only occupied one thread with a low CPU usage. Do I need to set some compiler flags to use multiprocessors? BTW, I've already set WITH_OPENMP.