概述
一、查找死锁原因:
1、使用gdb exe指令进入gdb命令行,再输入r运行可执行文件
gdb /home/sdhm/catkin_ws/devel/lib/gpd_ros/gpd_server
GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.5) 7.11.1
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /home/sdhm/catkin_ws/devel/lib/gpd_ros/gpd_server...done.
(gdb) r
Starting program: /home/sdhm/catkin_ws/devel/lib/gpd_ros/gpd_server
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7fffd5c41700 (LWP 1837)]
[New Thread 0x7fffd5440700 (LWP 1838)]
[New Thread 0x7fffd4c3f700 (LWP 1839)]
[New Thread 0x7fffcffff700 (LWP 1844)]
[New Thread 0x7fffcc85d700 (LWP 1847)]
[New Thread 0x7fffbdfff700 (LWP 1848)]
[New Thread 0x7fffbd7fe700 (LWP 1849)]
[New Thread 0x7fffb8ffd700 (LWP 1850)]
[New Thread 0x7fffb67fc700 (LWP 1851)]
[New Thread 0x7fffb3ffb700 (LWP 1852)]
[New Thread 0x7fffb17fa700 (LWP 1853)]
[Thread 0x7fffb17fa700 (LWP 1853) exited]
[Thread 0x7fffb3ffb700 (LWP 1852) exited]
[Thread 0x7fffb67fc700 (LWP 1851) exited]
[Thread 0x7fffb8ffd700 (LWP 1850) exited]
[Thread 0x7fffbd7fe700 (LWP 1849) exited]
[Thread 0x7fffbdfff700 (LWP 1848) exited]
[Thread 0x7fffcc85d700 (LWP 1847) exited]
[New Thread 0x7fffb17fa700 (LWP 1874)]
[New Thread 0x7fffb3ffb700 (LWP 1875)]
[New Thread 0x7fffb67fc700 (LWP 1876)]
[New Thread 0x7fffb8ffd700 (LWP 1925)]
[New Thread 0x7fff006eb700 (LWP 1926)]
[New Thread 0x7ffeffeea700 (LWP 1927)]
[New Thread 0x7ffeff6e9700 (LWP 1928)]
[New Thread 0x7ffefeee8700 (LWP 1929)]
[New Thread 0x7ffefe6e7700 (LWP 1930)]
[New Thread 0x7ffefdee6700 (LWP 1931)]
[New Thread 0x7ffefd6e5700 (LWP 1933)]
[New Thread 0x7ffefcee4700 (LWP 1935)]
[New Thread 0x7ffed7fff700 (LWP 1936)]
[New Thread 0x7ffed77fe700 (LWP 1937)]
[New Thread 0x7ffed6ffd700 (LWP 1938)]
[New Thread 0x7ffed67fc700 (LWP 1939)]
[New Thread 0x7ffed5ffb700 (LWP 1940)]
[New Thread 0x7ffed57fa700 (LWP 1941)]
[New Thread 0x7ffed4ff9700 (LWP 1942)]
[New Thread 0x7ffeb767e700 (LWP 1943)]
[New Thread 0x7ffeb6e7d700 (LWP 1944)]
[New Thread 0x7ffeb667c700 (LWP 1945)]
[New Thread 0x7ffeb5e7b700 (LWP 1946)]
[New Thread 0x7ffeb567a700 (LWP 1948)]
[New Thread 0x7ffeb4e79700 (LWP 1949)]
[New Thread 0x7ffeaffff700 (LWP 1950)]
[New Thread 0x7ffeaf7fe700 (LWP 1951)]
[New Thread 0x7ffeaeffd700 (LWP 1952)]
[New Thread 0x7ffeae7fc700 (LWP 1953)]
[New Thread 0x7ffeadffb700 (LWP 2166)]
[New Thread 0x7ffead7fa700 (LWP 2167)]
[New Thread 0x7ffeacff9700 (LWP 2168)]
[New Thread 0x7ffea7fff700 (LWP 2169)]
[New Thread 0x7ffea77fe700 (LWP 2170)]
[New Thread 0x7ffea6ffd700 (LWP 2171)]
[New Thread 0x7ffea67fc700 (LWP 2172)]
2、此时程序死锁,持续运行,但不往下走,按下ctrl + c
^C
Thread 1 "gpd_server" received signal SIGINT, Interrupt.
0x00007ffff72fcc1d in nanosleep () at ../sysdeps/unix/syscall-template.S:84
84 ../sysdeps/unix/syscall-template.S: No such file or directory.
3、查看线程栈信息,info stack,这个命令只能查看当前正在运行的某个线程的栈信息
(gdb) info stack
#0 0x00007ffff72fcc1d in nanosleep () at ../sysdeps/unix/syscall-template.S:84
#1 0x00007ffff647d96b in ros::ros_wallsleep(unsigned int, unsigned int) ()
from /opt/ros/kinetic/lib/librostime.so
#2 0x00007ffff6bf6f77 in ros::waitForShutdown() () from /opt/ros/kinetic/lib/libroscpp.so
#3 0x00007ffff6c15671 in ros::MultiThreadedSpinner::spin(ros::CallbackQueue*) ()
from /opt/ros/kinetic/lib/libroscpp.so
#4 0x0000000000430f30 in main (argc=1, argv=<optimized out>)
at /home/sdhm/catkin_ws/src/gpd_ros/src/gpd_ros/gpd_server.cpp:132
4、info threads查看所有线程id,前面有*的,代表正在运行的线程,其他没有*的极有可能是在阻塞或者死锁的。
(gdb) info threads
Id Target Id Frame
* 1 Thread 0x7ffff7f33c40 (LWP 1767) "gpd_server" 0x00007ffff72fcc1d in nanosleep ()
at ../sysdeps/unix/syscall-template.S:84
2 Thread 0x7fffd5c41700 (LWP 1837) "gpd_server" 0x00007ffff10a3a13 in epoll_wait ()
at ../sysdeps/unix/syscall-template.S:84
3 Thread 0x7fffd5440700 (LWP 1838) "gpd_server" 0x00007ffff109774d in poll ()
at ../sysdeps/unix/syscall-template.S:84
4 Thread 0x7fffd4c3f700 (LWP 1839) "gpd_server" pthread_cond_wait@@GLIBC_2.3.2 ()
at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
5 Thread 0x7fffcffff700 (LWP 1844) "gpd_server" pthread_cond_timedwait@@GLIBC_2.3.2 ()
at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225
13 Thread 0x7fffb17fa700 (LWP 1874) "gpd_server" 0x00007ffff10a48c8 in accept4 (fd=17,
addr=..., addr_len=0x7fffb17f9918, flags=524288)
at ../sysdeps/unix/sysv/linux/accept4.c:40
14 Thread 0x7fffb3ffb700 (LWP 1875) "gpd_server" 0x00007ffff109774d in poll ()
at ../sysdeps/unix/syscall-template.S:84
15 Thread 0x7fffb67fc700 (LWP 1876) "gpd_server" pthread_cond_wait@@GLIBC_2.3.2 ()
at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
16 Thread 0x7fffb8ffd700 (LWP 1925) "gpd_server" pthread_cond_timedwait@@GLIBC_2.3.2 ()
at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225
17 Thread 0x7fff006eb700 (LWP 1926) "gpd_server" 0x00007ffff72fb827 in futex_abstimed_wait_cancelable (private=0, abstime=0x0, expected=0, futex_word=0x8b4030)
at ../sysdeps/unix/sysv/linux/futex-internal.h:205
18 Thread 0x7ffeffeea700 (LWP 1927) "gpd_server" pthread_cond_timedwait@@GLIBC_2.3.2 ()
at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225
19 Thread 0x7ffeff6e9700 (LWP 1928) "gpd_server" pthread_cond_timedwait@@GLIBC_2.3.2 ()
at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225
---Type <return> to continue, or q <return> to quit---return
20 Thread 0x7ffefeee8700 (LWP 1929) "gpd_server" pthread_cond_timedwait@@GLIBC_2.3.2 ()
at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225
21 Thread 0x7ffefe6e7700 (LWP 1930) "gpd_server" pthread_cond_timedwait@@GLIBC_2.3.2 ()
at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225
22 Thread 0x7ffefdee6700 (LWP 1931) "gpd_server" pthread_cond_timedwait@@GLIBC_2.3.2 ()
at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225
23 Thread 0x7ffefd6e5700 (LWP 1933) "gpd_server" pthread_cond_timedwait@@GLIBC_2.3.2 ()
at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225
24 Thread 0x7ffefcee4700 (LWP 1935) "gpd_server" pthread_cond_wait@@GLIBC_2.3.2 ()
at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
25 Thread 0x7ffed7fff700 (LWP 1936) "gpd_server" pthread_cond_wait@@GLIBC_2.3.2 ()
at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
26 Thread 0x7ffed77fe700 (LWP 1937) "gpd_server" pthread_cond_wait@@GLIBC_2.3.2 ()
at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
27 Thread 0x7ffed6ffd700 (LWP 1938) "gpd_server" pthread_cond_wait@@GLIBC_2.3.2 ()
at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
28 Thread 0x7ffed67fc700 (LWP 1939) "gpd_server" pthread_cond_wait@@GLIBC_2.3.2 ()
at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
29 Thread 0x7ffed5ffb700 (LWP 1940) "gpd_server" pthread_cond_wait@@GLIBC_2.3.2 ()
at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
30 Thread 0x7ffed57fa700 (LWP 1941) "gpd_server" pthread_cond_wait@@GLIBC_2.3.2 ()
at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
31 Thread 0x7ffed4ff9700 (LWP 1942) "gpd_server" pthread_cond_wait@@GLIBC_2.3.2 ()
at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
32 Thread 0x7ffeb767e700 (LWP 1943) "gpd_server" 0x00007ffff72fcc1d in nanosleep ()
at ../sysdeps/unix/syscall-template.S:84
33 Thread 0x7ffeb6e7d700 (LWP 1944) "gpd_server" 0x00007ffff158db4f in ?? ()
---Type <return> to continue, or q <return> to quit---quit
from /usr/libQuit
5、thread apply all bt (thread apply all 命令,gdb会让所有线程都执行这个命令,比如命令为bt,查看所有线程的具体的栈信息)
(gdb) thread apply all bt
Thread 48 (Thread 0x7ffea67fc700 (LWP 2172)):
#0 0x00007ffff158db4f in ?? () from /usr/lib/x86_64-linux-gnu/libgomp.so.1
#1 0x00007ffff158b418 in ?? () from /usr/lib/x86_64-linux-gnu/libgomp.so.1
#2 0x00007ffff72f36ba in start_thread (arg=0x7ffea67fc700) at pthread_create.c:333
#3 0x00007ffff10a341d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
Thread 47 (Thread 0x7ffea6ffd700 (LWP 2171)):
#0 0x00007ffff158db4f in ?? () from /usr/lib/x86_64-linux-gnu/libgomp.so.1
#1 0x00007ffff158b418 in ?? () from /usr/lib/x86_64-linux-gnu/libgomp.so.1
#2 0x00007ffff72f36ba in start_thread (arg=0x7ffea6ffd700) at pthread_create.c:333
#3 0x00007ffff10a341d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
Thread 46 (Thread 0x7ffea77fe700 (LWP 2170)):
#0 0x00007ffff158db4f in ?? () from /usr/lib/x86_64-linux-gnu/libgomp.so.1
#1 0x00007ffff158b418 in ?? () from /usr/lib/x86_64-linux-gnu/libgomp.so.1
#2 0x00007ffff72f36ba in start_thread (arg=0x7ffea77fe700) at pthread_create.c:333
#3 0x00007ffff10a341d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
Thread 45 (Thread 0x7ffea7fff700 (LWP 2169)):
#0 0x00007ffff158db4f in ?? () from /usr/lib/x86_64-linux-gnu/libgomp.so.1
#1 0x00007ffff158b418 in ?? () from /usr/lib/x86_64-linux-gnu/libgomp.so.1
#2 0x00007ffff72f36ba in start_thread (arg=0x7ffea7fff700) at pthread_create.c:333
#3 0x00007ffff10a341d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
..........
6、将输出复制到文件中,查找出现lock的线程
Thread 17 (Thread 0x7fff006eb700 (LWP 1926)):
#0 0x00007ffff72fb827 in futex_abstimed_wait_cancelable (private=0, abstime=0x0,
expected=0, futex_word=0x8b4030) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
#1 do_futex_wait (sem=sem@entry=0x8b4030, abstime=0x0) at sem_waitcommon.c:111
---Type <return> to continue, or q <return> to quit---
#2 0x00007ffff72fb8d4 in __new_sem_wait_slow (sem=0x8b4030, abstime=0x0)
at sem_waitcommon.c:181
#3 0x00007ffff72fb97a in __new_sem_wait (sem=<optimized out>) at sem_wait.c:29
#4 0x00007ffff05d5028 in PyThread_acquire_lock ()
from /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
#5 0x00007ffff05a9966 in PyEval_RestoreThread ()
from /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
#6 0x00007ffff0624cf6 in PyGILState_Ensure ()
from /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
#7 0x00007fffab08b47e in std::_Function_handler<void (void*), torch::utils::tensor_from_numpy(_object*)::{lambda(void*)#1}>::_M_invoke(std::_Any_data const&, void*) ()
from /usr/local/lib/python2.7/dist-packages/torch/lib/libtorch_python.so
#8 0x00007fff70bba37c in c10::deleteInefficientStdFunctionContext(void*) ()
from /usr/local/lib/python2.7/dist-packages/torch/lib/libc10.so
#9 0x00007fff71471610 in at::TensorImpl::release_resources() ()
from /usr/local/lib/python2.7/dist-packages/torch/lib/libcaffe2.so
#10 0x00007fff6f55d04b in c10::intrusive_ptr<at::TensorImpl, at::UndefinedTensorImpl>::reset_() () from /usr/local/lib/python2.7/dist-packages/torch/lib/libtorch.so.1
#11 0x00007fff6f7d1e67 in torch::autograd::Variable::Impl::release_resources() ()
from /usr/local/lib/python2.7/dist-packages/torch/lib/libtorch.so.1
#12 0x00007fffaad02b6b in c10::intrusive_ptr<at::TensorImpl, at::UndefinedTensorImpl>::reset_() () from /usr/local/lib/python2.7/dist-packages/torch/lib/libtorch_python.so
#13 0x00007fffab0854d0 in torch::utils::(anonymous namespace)::internal_new_from_data(at::Type const&, c10::optional<c10::Device>, _object*, bool, bool, bool) ()
from /usr/local/lib/python2.7/dist-packages/torch/lib/libtorch_python.so
#14 0x00007fffab0872fd in torch::utils::legacy_new_from_data(at::Type const&, c10::optional<c10::Device>, _object*) ()
---Type <return> to continue, or q <return> to quit---
from /usr/local/lib/python2.7/dist-packages/torch/lib/libtorch_python.so
#15 0x00007fffab087363 in torch::utils::(anonymous namespace)::legacy_new_from_sequence(at::Type const&, c10::optional<c10::Device>, _object*) ()
from /usr/local/lib/python2.7/dist-packages/torch/lib/libtorch_python.so
#16 0x00007fffab0896a8 in torch::utils::legacy_tensor_ctor(at::Type const&, _object*, _object*) () from /usr/local/lib/python2.7/dist-packages/torch/lib/libtorch_python.so
#17 0x00007fffab06285a in torch::tensors::Tensor_new(_typeobject*, _object*, _object*) ()
from /usr/local/lib/python2.7/dist-packages/torch/lib/libtorch_python.so
#18 0x00007ffff05c81b3 in ?? () from /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
#19 0x00007ffff06112b3 in PyObject_Call () from /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
#20 0x00007ffff05af39c in PyEval_EvalFrameEx ()
from /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
#21 0x00007ffff06e811c in PyEval_EvalCodeEx ()
from /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
#22 0x00007ffff063e3b0 in ?? () from /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
#23 0x00007ffff06112b3 in PyObject_Call () from /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
#24 0x00007ffff06e7547 in PyEval_CallObjectWithKeywords ()
from /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
#25 0x00007ffff775cc1b in gpd::net::PythonClassifier::classifyPointsBatch(std::vector<std::unique_ptr<Eigen::Matrix<double, 3, -1, 0, 3, -1>, std::default_delete<Eigen::Matrix<double, 3, -1, 0, 3, -1> > >, std::allocator<std::unique_ptr<Eigen::Matrix<double, 3, -1, 0, 3, -1>, std::default_delete<Eigen::Matrix<double, 3, -1, 0, 3, -1> > > > > const&) ()
from /usr/local/lib/libgpd_pointnet.so
#26 0x00007ffff7747c4f in gpd::GraspDetectorPointNet::detectGrasps(gpd::util::Cloud&) ()
from /usr/local/lib/libgpd_pointnet.so
#27 0x0000000000432663 in GraspDetectionServer::run (this=this@entry=0x7fffffffcd90,
loop_rate=loop_rate@entry=15)
---Type <return> to continue, or q <return> to quit---
at /home/sdhm/catkin_ws/src/gpd_ros/src/gpd_ros/gpd_server.cpp:57
#28 0x00000000004328ef in GraspDetectionServer::detectGrasps (this=0x7fffffffcd90, req=...,
res=...) at /home/sdhm/catkin_ws/src/gpd_ros/src/gpd_ros/gpd_server.cpp:72
#29 0x0000000000469b12 in boost::function2<bool, gpd_ros::detect_graspsRequest_<std::allocator<void> >&, gpd_ros::detect_graspsResponse_<std::allocator<void> >&>::operator() (a1=...,
a0=..., this=0x205efe8) at /usr/include/boost/function/function_template.hpp:773
#30 ros::ServiceSpec<gpd_ros::detect_graspsRequest_<std::allocator<void> >, gpd_ros::detect_graspsResponse_<std::allocator<void> > >::call(boost::function<bool (gpd_ros::detect_graspsRequest_<std::allocator<void> >&, gpd_ros::detect_graspsResponse_<std::allocator<void> >&)> const&, ros::ServiceSpecCallParams<gpd_ros::detect_graspsRequest_<std::allocator<void> >, gpd_ros::detect_graspsResponse_<std::allocator<void> > >&) (params=<synthetic pointer>, cb=...)
at /opt/ros/kinetic/include/ros/service_callback_helper.h:125
#31 ros::ServiceCallbackHelperT<ros::ServiceSpec<gpd_ros::detect_graspsRequest_<std::allocator<void> >, gpd_ros::detect_graspsResponse_<std::allocator<void> > > >::call (this=0x205efe0,
params=...) at /opt/ros/kinetic/include/ros/service_callback_helper.h:182
#32 0x00007ffff6b5f501 in ros::ServiceCallback::call() ()
from /opt/ros/kinetic/lib/libroscpp.so
#33 0x00007ffff6bb3838 in ros::CallbackQueue::callOneCB(ros::CallbackQueue::TLS*) ()
from /opt/ros/kinetic/lib/libroscpp.so
#34 0x00007ffff6bb4074 in ros::CallbackQueue::callOne(ros::WallDuration) ()
from /opt/ros/kinetic/lib/libroscpp.so
#35 0x00007ffff6c11265 in ros::AsyncSpinnerImpl::threadFunc() ()
from /opt/ros/kinetic/lib/libroscpp.so
#36 0x00007fffeef395d5 in ?? () from /usr/lib/x86_64-linux-gnu/libboost_thread.so.1.58.0
#37 0x00007ffff72f36ba in start_thread (arg=0x7fff006eb700) at pthread_create.c:333
#38 0x00007ffff10a341d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
看到Thread 17中出现了 PyThread_acquire_lock,说明Python代码死锁了。
二、解决死锁问题
以上问题的出现是由于在C++回调函数中调用了Python函数,但是确没有获取Python的GIL。当调用C/C++回调时,线程正在运行。如果从另一个非Python创建线程调用,那么在调用任何Python API函数之前,必须获取Python的全局解释器锁(GIL)。否则,程序的行为是未定义的。
解决方法:在调用Python函数前获得GIL锁,在调用后释放。
// C++中的回调函数,或回调函数中的函数
void callback() {
static gil_init = false;
if(!gil_init) { // 确保GIL锁已被创建, 并仅创建一次
PyEval_InitThreads();
PyEval_SaveThread();
gil_init = true;
}
// 获取GIL
PyGILState_STATE gstate;
gstate = PyGILState_Ensure();
// 获取参数等
// 调用Python函数
PyObject * pInstance = PyObject_CallObject(pFunc, args);
// 其他Python操作
// 释放锁,后面不可有Python相关API调用
PyGILState_Release(gstate);
}
参考:
Linux C/C++ 多线程死锁的gdb调试方法
Calling python method from C++ (or C) callback
如何在多线程C应用程序中嵌入python?
最后
以上就是爱撒娇悟空为你收集整理的C++回调函数中调用Python函数出现的死锁问题调试及解决的全部内容,希望文章能够帮你解决C++回调函数中调用Python函数出现的死锁问题调试及解决所遇到的程序开发问题。
如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。
发表评论 取消回复