  1. SSE optimization is enabled by default in x86 builds. So, you shouldn't do anything, in fact it is not so easy to disable the SSE code in the OpenCV ;-)
  2. With WITH_TBB=ON OpenCV tries to use several threads for some functions. The problem is that just a handsome of function are threaded with TBB at the moment (may be a dozen). So, it is hard to see any speedup. OpenCV philosophy here is that application should be multi-threaded, not OpenCV functions. But some time ago we started to use TBB even for primitive functions, because we need faster processing on mobile architectures. So, let's wait some time I and hopefully you'll get visible speedup with TBB enabled.

setNumThreads is used to control the number of threads. But I have to check if it works for TBB...