Using DEBUG_THREADS to find deadlocks.

Using DEBUG_THREADS to find deadlocks.

Asterisk’s DEBUG_THREADS is a compile time tool that helps find deadlocks involving Asterisk locks. You enable DEBUG_THREADS in menuselect’s “Compiler Flags” menu along with other useful compile time options like DONT_OPTIMIZE and BETTER_BACKTRACES. It is strongly recommended that you enable BETTER_BACKTRACES for the output of the Command Line Interface (CLI) “core show locks” command to be really helpful.

Using the ast_coredumper script which is described in Getting-a-Backtrace is an easy way to gather the locking information also presented by the CLI “core show locks” command and to gather backtraces.

Examining a deadlock issue

We’ll examine the ASTERISK-27460 CDR deadlock issue filed against Asterisk 13.18.3. Below is the output of the CLI “core show locks” command.

=======================================================================
=== 13.18.3
=== Currently Held Locks
=======================================================================
===
===  <lock#> ():      (times locked)
===
=== Thread ID: 0xb66ffb70 LWP:27257 (default_tps_processing_function started at [  202] taskprocessor.c default_listener_start())
=== ---> Waiting for Lock #0 (func_cdr.c): MUTEX 412 cdr_write_callback payload->chan 0xa75f588 (1)
	main/backtrace.c:59 __ast_bt_get_addresses() (0x80b4305+19)
	main/lock.c:258 __ast_pthread_mutex_lock() (0x8190c87+85)
	main/astobj2.c:164 __ao2_lock() (0x80a8be7+7C)
	funcs/func_cdr.c:413 cdr_write_callback()
	main/stasis_message_router.c:204 router_dispatch()
	main/stasis.c:437 subscription_invoke()
	main/stasis.c:744 dispatch_exec_sync()
	main/taskprocessor.c:963 ast_taskprocessor_execute() (0x8256cfc+100)
	main/taskprocessor.c:177 default_tps_processing_function()
	main/utils.c:1238 dummy_start()
	nptl/pthread_create.c:300 start_thread()
	libc.so.6 clone() (0x106f4c0+5E)
=== --- ---> Locked Here: pbx.c line 7658 (pbx_outgoing_attempt)
=== -------------------------------------------------------------------
===
=== Thread ID: 0xb573fb70 LWP:27463 (fast_originate       started at [ 5626] manager.c action_originate())
=== ---> Lock #0 (pbx.c): MUTEX 7658 pbx_outgoing_attempt dialed 0xa75f588 (1)
	main/backtrace.c:59 __ast_bt_get_addresses() (0x80b4305+19)
	main/lock.c:258 __ast_pthread_mutex_lock() (0x8190c87+85)
	main/astobj2.c:164 __ao2_lock() (0x80a8be7+7C)
	main/pbx.c:7659 pbx_outgoing_attempt()
	main/pbx.c:7796 ast_pbx_outgoing_exten() (0x81e3caf+104)
	main/manager.c:5161 fast_originate()
	main/utils.c:1238 dummy_start()
	nptl/pthread_create.c:300 start_thread()
	libc.so.6 clone() (0x106f4c0+5E)
=== ---> Lock #1 (stasis.c): MUTEX 829 publish_msg topic 0xb6ba7dc0 (1)
	main/backtrace.c:59 __ast_bt_get_addresses() (0x80b4305+19)
	main/lock.c:258 __ast_pthread_mutex_lock() (0x8190c87+85)
	main/astobj2.c:164 __ao2_lock() (0x80a8be7+7C)
	main/stasis.c:830 publish_msg()
	main/stasis.c:851 stasis_publish_sync() (0x8235fda+80)
	main/stasis_message_router.c:289 stasis_message_router_publish_sync() (0x824a186+97)
	funcs/func_cdr.c:570 cdr_write()
	main/pbx_functions.c:711 ast_func_write() (0x81efaf5+1A4)
	main/pbx_variables.c:1036 pbx_builtin_setvar_helper() (0x81f4360+B7)
	main/channel.c:8075 ast_set_variables() (0x8113559+2C)
	main/pbx.c:7660 pbx_outgoing_attempt()
	main/pbx.c:7796 ast_pbx_outgoing_exten() (0x81e3caf+104)
	main/manager.c:5161 fast_originate()
	main/utils.c:1238 dummy_start()
	nptl/pthread_create.c:300 start_thread()
	libc.so.6 clone() (0x106f4c0+5E)
=== -------------------------------------------------------------------
===
=======================================================================

In a classic deadlock situation we have two threads holding different locks and these threads are waiting to acquire the lock that the other thread has. With a classic deadlock the CLI “core show locks” command output is enough to tell us which locks are involved and how the code got there. As we can see in the CLI “core show locks” command output we don’t have a classic deadlock. Thread LWP:27257 is waiting for the 0xa75f588 lock that is held by thread LWP:27463 but thread LWP:27463 is not waiting on any lock. We need more information.

Thread 59 (Thread 0xb66ffb70 (LWP 27257)):
#0  0x00528422 in __kernel_vsyscall ()
#1  0x009ffaf9 in __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/i386/i686/../i486/lowlevellock.S:142
#2  0x009fb149 in _L_lock_839 () from /lib/tls/i686/cmov/libpthread.so.0
#3  0x009fafdb in __pthread_mutex_lock (mutex=0xa75f588) at pthread_mutex_lock.c:82
#4  0x08190dbe in __ast_pthread_mutex_lock (filename=0x1f7a84b "func_cdr.c", lineno=412, func=0x1f7b634 "cdr_write_callback", mutex_name=0x1f7b0b9 "payload->chan", t=0xa75f588)
    at lock.c:315
#5  0x080a8c63 in __ao2_lock (user_data=0xa75f5b8, lock_how=AO2_LOCK_REQ_MUTEX, file=0x1f7a84b "func_cdr.c", func=0x1f7b634 "cdr_write_callback", line=412,
    var=0x1f7b0b9 "payload->chan") at astobj2.c:164
#6  0x01f79629 in cdr_write_callback (data=0x0, sub=0xb6b3f580, message=0xa39a860) at func_cdr.c:412
#7  0x08249e1f in router_dispatch (data=0xb6db0aa0, sub=0xb6b3f580, message=0xa39a860) at stasis_message_router.c:201
#8  0x08234c51 in subscription_invoke (sub=0xb6b3f580, message=0xa39a860) at stasis.c:434
#9  0x082359b1 in dispatch_exec_sync (local=0xb66ff250) at stasis.c:743
#10 0x08256dfc in ast_taskprocessor_execute (tps=0xb6b51438) at taskprocessor.c:963
#11 0x08254e48 in default_tps_processing_function (data=0xb6d5ff10) at taskprocessor.c:185
#12 0x082705e7 in dummy_start (data=0xb6b5abe8) at utils.c:1238
#13 0x009f896e in start_thread (arg=0xb66ffb70) at pthread_create.c:300
#14 0x0106f51e in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:130

...

Thread 5 (Thread 0xb573fb70 (LWP 27463)):
#0  0x00528422 in __kernel_vsyscall ()
#1  0x009fd015 in pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/i386/i686/../i486/pthread_cond_wait.S:122
#2  0x08191cd6 in __ast_cond_wait (filename=0x82c718b "stasis.c", lineno=800, func=0x82c7bbe "dispatch_message", cond_name=0x82c73a4 "&std.cond",
    mutex_name=0x82c739a "&std.lock", cond=0xb573eb74, t=0xb573eb54) at lock.c:600
#3  0x08235cb2 in dispatch_message (sub=0xb6b3f580, message=0xa39a860, synchronous=1) at stasis.c:800
#4  0x08235f66 in publish_msg (topic=0xb6ba7df0, message=0xa39a860, sync_sub=0xb6b3f580) at stasis.c:835
#5  0x0823605a in stasis_publish_sync (sub=0xb6b3f580, message=0xa39a860) at stasis.c:850
#6  0x0824a21d in stasis_message_router_publish_sync (router=0xb6db0aa0, message=0xa39a860) at stasis_message_router.c:288
#7  0x01f7a024 in cdr_write (chan=0xa75f5b8, cmd=0xb573ed50 "CDR", parse=0xb573ed54 "amaflags", value=0x989ddca "OMIT") at func_cdr.c:568
#8  0x081efc99 in ast_func_write (chan=0xa75f5b8, function=0xb573ede0 "CDR(amaflags)", value=0x989ddca "OMIT") at pbx_functions.c:711
#9  0x081f4417 in pbx_builtin_setvar_helper (chan=0xa75f5b8, name=0x989ddbc "CDR(amaflags)", value=0x989ddca "OMIT") at pbx_variables.c:1036
#10 0x08113585 in ast_set_variables (chan=0xa75f5b8, vars=0x989dd68) at channel.c:8076
#11 0x081e35f8 in pbx_outgoing_attempt (type=0x96c0a72 "PJSIP", cap=0x9b7d258, addr=0x96c0a7a "200", timeout=30000, context=0x96c0a80 "pjsip_b", exten=0x96c0a8a "220",
    priority=1, app=0x0, appdata=0x0, reason=0xb573f260, synchronous=1, cid_num=0x0, cid_name=0x0, vars=0x989dd68, account=0x82fc802 "", locked_channel=0xb573f25c, early_media=0,
    assignedids=0xb573f168) at pbx.c:7660
#12 0x081e3db3 in ast_pbx_outgoing_exten (type=0x96c0a72 "PJSIP", cap=0x9b7d258, addr=0x96c0a7a "200", timeout=30000, context=0x96c0a80 "pjsip_b", exten=0x96c0a8a "220",
    priority=1, reason=0xb573f260, synchronous=1, cid_num=0x0, cid_name=0x0, vars=0x989dd68, account=0x82fc802 "", locked_channel=0xb573f25c, early_media=0,
    assignedids=0xb573f168) at pbx.c:7796
#13 0x081a9026 in fast_originate (data=0x9bccbe8) at manager.c:5161
#14 0x082705e7 in dummy_start (data=0x9b5bf78) at utils.c:1238
#15 0x009f896e in start_thread (arg=0xb573fb70) at pthread_create.c:300
#16 0x0106f51e in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:130

Looking at the threads identified by the CLI “core show locks” output in the backtrace files we see that thread LWP:27463 is waiting on a condition variable. This particular condition variable does not get set until thread LWP:27257 completes its current task. This then is the deadlock scenario. Thread LWP:27463 is waiting on thread LWP:27257 to complete its task but thread LWP:27257 is waiting on a lock held by thread LWP:27463.

Once the deadlock scenario is identified a means to prevent the deadlock needs to be devised. In this case the CDR dialplan function was used in a deprecated manner to gather information readily available using the CHANNEL dialplan function. The Asterisk issue has links to the merged gerrit review patch that fixed the issue.

Summary

DEBUG_THREADS is by no means a silver bullet. It is not recommended to normally run Asterisk with the option enabled as it can really impact performance. It does not help as much in non-classic deadlock scenarios or if the locks involved are in a third-party library such as PJPROJECT or an ODBC driver. In those cases you need backtraces more than the output of the CLI “core show locks” command. However, it does readily show the Asterisk locks a thread has and wants which is difficult to determine from just a backtrace.

For more information about locking in Asterisk see Locking-in-Asterisk.

One Response

Leave a Reply

Your email address will not be published. Required fields are marked *


The reCAPTCHA verification period has expired. Please reload the page.

About the Author

What can we help you find?