1 | initial version |
Thanks to the answer of 'robin' above, it became clear to me that the true culprit was openblas. Further googling rapidly gave the answer, which was even explicitly stated in the faq: https://github.com/xianyi/OpenBLAS/wiki/faq: If your application is already multi-threaded, it will conflict with OpenBLAS multi-threading. Thus, you must set OpenBLAS to use single thread as following.
export OPENBLAS_NUM_THREADS=1 in the environment variables. Or
Call openblas_set_num_threads(1) in the application on runtime. Or
Build OpenBLAS single thread version, e.g. make USE_THREAD=0 USE_LOCKING=1 (see comment below)
Indeed, it solved my problem. I', stunned that an external library can have this impact.