Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

segfault caffe

I have a thorny issue - when I call cv2 functions from a certain code (caffe) in a docker container I hit the following segfault, which does not occur if not running in a container; however the nvidia container claims to be trouble free. I am not expert in reading stacktraces like these so any pointers to solution would be appreciated. I hit the segfault on using cv2 functions like cv2.flip or getrotationmatrix. However if I just call those functions (in the container) from a python command line, everything is ok....

==15165== Syscall param msync(start) points to uninitialised byte(s)
==15165==    at 0x721892D: ??? (syscall-template.S:81)
==15165==    by 0x121D7123: ??? (in /usr/lib/x86_64-linux-gnu/libunwind.so.8.0.1)
==15165==    by 0x121D9EF6: ??? (in /usr/lib/x86_64-linux-gnu/libunwind.so.8.0.1)
==15165==    by 0x121DB151: ??? (in /usr/lib/x86_64-linux-gnu/libunwind.so.8.0.1)
==15165==    by 0x121DB4E8: ??? (in /usr/lib/x86_64-linux-gnu/libunwind.so.8.0.1)
==15165==    by 0x121D7A30: _ULx86_64_step (in /usr/lib/x86_64-linux-gnu/libunwind.so.8.0.1)
==15165==    by 0x5ABA442: google::GetStackTrace(void**, int, int) (in /usr/lib/x86_64-linux-gnu/libglog.so.0.0.0)
==15165==    by 0x5ABFB31: ??? (in /usr/lib/x86_64-linux-gnu/libglog.so.0.0.0)
==15165==    by 0x715ACAF: ??? (in /lib/x86_64-linux-gnu/libc-2.19.so)
==15165==    by 0x11220C45: cv::flip(cv::_InputArray const&, cv::_OutputArray const&, int) (in /usr/lib/x86_64-linux-gnu/libopencv_core.so.2.4.8)
==15165==    by 0x82BDEDC3: pyopencv_cv_flip(_object*, _object*, _object*) (in /opencv/build/lib/cv2.so)
==15165==    by 0x65E60D3: PyEval_EvalFrameEx (in /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0)
==15165==  Address 0xffeffd000 is on thread 1's stack
==15165==  in frame #6, created by google::GetStackTrace(void**, int, int) (???:)
==15165== 
*** SIGSEGV (@0x1010000) received by PID 15165 (TID 0x406abc0) from PID 16842752; stack trace: ***
    @          0x715acb0 (unknown)
    @         0x11220c46 (unknown)
    @         0x82bdedc4 pyopencv_cv_flip()
    @          0x65e60d4 (unknown)
    @          0x65e6059 (unknown)
    @          0x65e754d (unknown)
    @          0x65e5dd8 (unknown)
    @          0x65e6059 (unknown)
    @          0x65e754d (unknown)
    @          0x661c6d0 (unknown)
    @          0x6588d43 (unknown)
    @          0x65147bd (unknown)
    @          0x6588d43 (unknown)
    @          0x6601577 (unknown)
    @          0x6544617 (unknown)
    @         0x715a34d5 caffe::PythonLayer<>::Reshape()
    @          0x50d24b5 caffe::Net<>::Init()
    @          0x50d3345 caffe::Net<>::Net()
    @          0x508066a caffe::Solver<>::InitTrainNet()
    @          0x508187c caffe::Solver<>::Init()
    @          0x5081baa caffe::Solver<>::Solver()
    @          0x5103053 caffe::Creator_SGDSolver<>()
    @           0x411fc6 caffe::SolverRegistry<>::CreateSolver()
    @           0x40af42 train()
    @           0x40897c main
    @          0x7145f45 (unknown)
    @           0x409283 (unknown)
    @                0x0 (unknown)
==15165== 
==15165== Process terminating with default action of signal 11 (SIGSEGV)
==15165==    at 0x11220C46: cv::flip(cv::_InputArray const&, cv::_OutputArray const&, int) (in /usr/lib/x86_64-linux-gnu/libopencv_core.so.2.4.8)
==15165==    by 0x82BDEDC3: pyopencv_cv_flip(_object*, _object*, _object*) (in /opencv/build/lib/cv2.so)
==15165==    by 0x65E60D3: PyEval_EvalFrameEx (in /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0)
==15165==    by 0x65E6058: PyEval_EvalFrameEx (in /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0)
==15165==    by 0x65E754C: PyEval_EvalCodeEx (in /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0)
==15165==    by 0x65E5DD7: PyEval_EvalFrameEx (in /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0)
==15165==    by 0x65E6058: PyEval_EvalFrameEx (in /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0)
==15165==    by 0x65E754C: PyEval_EvalCodeEx (in /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0)
==15165==    by 0x661C6CF: ??? (in /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0)
==15165==    by 0x6588D42: PyObject_Call (in /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0)
==15165==    by 0x65147BC: ??? (in /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0)
==15165==    by 0x6588D42: PyObject_Call (in /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0)
--15165-- Discarding syms at 0x350d22a0-0x350d7eb3 in /lib/x86_64-linux-gnu/libnss_files-2.19.so due to munmap()
--15165-- Discarding syms at 0x352dc100-0x352df560 in /lib/x86_64-linux-gnu/libnss_dns-2.19.so due to munmap()
==15165==

segfault on python calls from caffe

I have a thorny issue - when I call cv2 functions from a certain code (caffe) in a docker container I hit the following segfault, which does not occur if not running in a container; however the nvidia container claims to be trouble free. I am not expert in reading stacktraces like these so any pointers to solution would be appreciated. I hit the segfault on using cv2 functions like cv2.flip or getrotationmatrix. However if I just call those functions (in the container) from a python command line, everything is ok....

==15165== Syscall param msync(start) points to uninitialised byte(s)
==15165==    at 0x721892D: ??? (syscall-template.S:81)
==15165==    by 0x121D7123: ??? (in /usr/lib/x86_64-linux-gnu/libunwind.so.8.0.1)
==15165==    by 0x121D9EF6: ??? (in /usr/lib/x86_64-linux-gnu/libunwind.so.8.0.1)
==15165==    by 0x121DB151: ??? (in /usr/lib/x86_64-linux-gnu/libunwind.so.8.0.1)
==15165==    by 0x121DB4E8: ??? (in /usr/lib/x86_64-linux-gnu/libunwind.so.8.0.1)
==15165==    by 0x121D7A30: _ULx86_64_step (in /usr/lib/x86_64-linux-gnu/libunwind.so.8.0.1)
==15165==    by 0x5ABA442: google::GetStackTrace(void**, int, int) (in /usr/lib/x86_64-linux-gnu/libglog.so.0.0.0)
==15165==    by 0x5ABFB31: ??? (in /usr/lib/x86_64-linux-gnu/libglog.so.0.0.0)
==15165==    by 0x715ACAF: ??? (in /lib/x86_64-linux-gnu/libc-2.19.so)
==15165==    by 0x11220C45: cv::flip(cv::_InputArray const&, cv::_OutputArray const&, int) (in /usr/lib/x86_64-linux-gnu/libopencv_core.so.2.4.8)
==15165==    by 0x82BDEDC3: pyopencv_cv_flip(_object*, _object*, _object*) (in /opencv/build/lib/cv2.so)
==15165==    by 0x65E60D3: PyEval_EvalFrameEx (in /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0)
==15165==  Address 0xffeffd000 is on thread 1's stack
==15165==  in frame #6, created by google::GetStackTrace(void**, int, int) (???:)
==15165== 
*** SIGSEGV (@0x1010000) received by PID 15165 (TID 0x406abc0) from PID 16842752; stack trace: ***
    @          0x715acb0 (unknown)
    @         0x11220c46 (unknown)
    @         0x82bdedc4 pyopencv_cv_flip()
    @          0x65e60d4 (unknown)
    @          0x65e6059 (unknown)
    @          0x65e754d (unknown)
    @          0x65e5dd8 (unknown)
    @          0x65e6059 (unknown)
    @          0x65e754d (unknown)
    @          0x661c6d0 (unknown)
    @          0x6588d43 (unknown)
    @          0x65147bd (unknown)
    @          0x6588d43 (unknown)
    @          0x6601577 (unknown)
    @          0x6544617 (unknown)
    @         0x715a34d5 caffe::PythonLayer<>::Reshape()
    @          0x50d24b5 caffe::Net<>::Init()
    @          0x50d3345 caffe::Net<>::Net()
    @          0x508066a caffe::Solver<>::InitTrainNet()
    @          0x508187c caffe::Solver<>::Init()
    @          0x5081baa caffe::Solver<>::Solver()
    @          0x5103053 caffe::Creator_SGDSolver<>()
    @           0x411fc6 caffe::SolverRegistry<>::CreateSolver()
    @           0x40af42 train()
    @           0x40897c main
    @          0x7145f45 (unknown)
    @           0x409283 (unknown)
    @                0x0 (unknown)
==15165== 
==15165== Process terminating with default action of signal 11 (SIGSEGV)
==15165==    at 0x11220C46: cv::flip(cv::_InputArray const&, cv::_OutputArray const&, int) (in /usr/lib/x86_64-linux-gnu/libopencv_core.so.2.4.8)
==15165==    by 0x82BDEDC3: pyopencv_cv_flip(_object*, _object*, _object*) (in /opencv/build/lib/cv2.so)
==15165==    by 0x65E60D3: PyEval_EvalFrameEx (in /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0)
==15165==    by 0x65E6058: PyEval_EvalFrameEx (in /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0)
==15165==    by 0x65E754C: PyEval_EvalCodeEx (in /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0)
==15165==    by 0x65E5DD7: PyEval_EvalFrameEx (in /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0)
==15165==    by 0x65E6058: PyEval_EvalFrameEx (in /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0)
==15165==    by 0x65E754C: PyEval_EvalCodeEx (in /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0)
==15165==    by 0x661C6CF: ??? (in /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0)
==15165==    by 0x6588D42: PyObject_Call (in /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0)
==15165==    by 0x65147BC: ??? (in /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0)
==15165==    by 0x6588D42: PyObject_Call (in /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0)
--15165-- Discarding syms at 0x350d22a0-0x350d7eb3 in /lib/x86_64-linux-gnu/libnss_files-2.19.so due to munmap()
--15165-- Discarding syms at 0x352dc100-0x352df560 in /lib/x86_64-linux-gnu/libnss_dns-2.19.so due to munmap()
==15165==

segfault on python calls from caffecaffe in container

I have a thorny issue - when I call cv2 functions from a certain code (caffe) in a docker container I hit the following segfault, which does not occur if not running in a container; however the nvidia container claims to be trouble free. I am not expert in reading stacktraces like these so any pointers to solution would be appreciated. I hit the segfault on using cv2 functions like cv2.flip or getrotationmatrix. However if I just call those functions (in the container) from a python command line, everything is ok....

==15165== Syscall param msync(start) points to uninitialised byte(s)
==15165==    at 0x721892D: ??? (syscall-template.S:81)
==15165==    by 0x121D7123: ??? (in /usr/lib/x86_64-linux-gnu/libunwind.so.8.0.1)
==15165==    by 0x121D9EF6: ??? (in /usr/lib/x86_64-linux-gnu/libunwind.so.8.0.1)
==15165==    by 0x121DB151: ??? (in /usr/lib/x86_64-linux-gnu/libunwind.so.8.0.1)
==15165==    by 0x121DB4E8: ??? (in /usr/lib/x86_64-linux-gnu/libunwind.so.8.0.1)
==15165==    by 0x121D7A30: _ULx86_64_step (in /usr/lib/x86_64-linux-gnu/libunwind.so.8.0.1)
==15165==    by 0x5ABA442: google::GetStackTrace(void**, int, int) (in /usr/lib/x86_64-linux-gnu/libglog.so.0.0.0)
==15165==    by 0x5ABFB31: ??? (in /usr/lib/x86_64-linux-gnu/libglog.so.0.0.0)
==15165==    by 0x715ACAF: ??? (in /lib/x86_64-linux-gnu/libc-2.19.so)
==15165==    by 0x11220C45: cv::flip(cv::_InputArray const&, cv::_OutputArray const&, int) (in /usr/lib/x86_64-linux-gnu/libopencv_core.so.2.4.8)
==15165==    by 0x82BDEDC3: pyopencv_cv_flip(_object*, _object*, _object*) (in /opencv/build/lib/cv2.so)
==15165==    by 0x65E60D3: PyEval_EvalFrameEx (in /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0)
==15165==  Address 0xffeffd000 is on thread 1's stack
==15165==  in frame #6, created by google::GetStackTrace(void**, int, int) (???:)
==15165== 
*** SIGSEGV (@0x1010000) received by PID 15165 (TID 0x406abc0) from PID 16842752; stack trace: ***
    @          0x715acb0 (unknown)
    @         0x11220c46 (unknown)
    @         0x82bdedc4 pyopencv_cv_flip()
    @          0x65e60d4 (unknown)
    @          0x65e6059 (unknown)
    @          0x65e754d (unknown)
    @          0x65e5dd8 (unknown)
    @          0x65e6059 (unknown)
    @          0x65e754d (unknown)
    @          0x661c6d0 (unknown)
    @          0x6588d43 (unknown)
    @          0x65147bd (unknown)
    @          0x6588d43 (unknown)
    @          0x6601577 (unknown)
    @          0x6544617 (unknown)
    @         0x715a34d5 caffe::PythonLayer<>::Reshape()
    @          0x50d24b5 caffe::Net<>::Init()
    @          0x50d3345 caffe::Net<>::Net()
    @          0x508066a caffe::Solver<>::InitTrainNet()
    @          0x508187c caffe::Solver<>::Init()
    @          0x5081baa caffe::Solver<>::Solver()
    @          0x5103053 caffe::Creator_SGDSolver<>()
    @           0x411fc6 caffe::SolverRegistry<>::CreateSolver()
    @           0x40af42 train()
    @           0x40897c main
    @          0x7145f45 (unknown)
    @           0x409283 (unknown)
    @                0x0 (unknown)
==15165== 
==15165== Process terminating with default action of signal 11 (SIGSEGV)
==15165==    at 0x11220C46: cv::flip(cv::_InputArray const&, cv::_OutputArray const&, int) (in /usr/lib/x86_64-linux-gnu/libopencv_core.so.2.4.8)
==15165==    by 0x82BDEDC3: pyopencv_cv_flip(_object*, _object*, _object*) (in /opencv/build/lib/cv2.so)
==15165==    by 0x65E60D3: PyEval_EvalFrameEx (in /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0)
==15165==    by 0x65E6058: PyEval_EvalFrameEx (in /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0)
==15165==    by 0x65E754C: PyEval_EvalCodeEx (in /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0)
==15165==    by 0x65E5DD7: PyEval_EvalFrameEx (in /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0)
==15165==    by 0x65E6058: PyEval_EvalFrameEx (in /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0)
==15165==    by 0x65E754C: PyEval_EvalCodeEx (in /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0)
==15165==    by 0x661C6CF: ??? (in /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0)
==15165==    by 0x6588D42: PyObject_Call (in /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0)
==15165==    by 0x65147BC: ??? (in /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0)
==15165==    by 0x6588D42: PyObject_Call (in /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0)
--15165-- Discarding syms at 0x350d22a0-0x350d7eb3 in /lib/x86_64-linux-gnu/libnss_files-2.19.so due to munmap()
--15165-- Discarding syms at 0x352dc100-0x352df560 in /lib/x86_64-linux-gnu/libnss_dns-2.19.so due to munmap()
==15165==