Hi and welcome back! Let’s take a look at another recent ticket: ASTERISK-29698.
Integral to almost every section of code within Asterisk is astobj2. We don’t actually use it directly though, we use via the ao2 macros defined in astobj2.h. In object oriented languages this would be an example of abstraction with the ao2 macros serving as the library interface. In this case the macros do quite a bit more than serve as a pure interface – they help create and destroy the underlying containers, embed appropriate logging, create various types of locking mechanisms and combine useful tasks.
The ao2 container relies heavily on the ao2_ref function. This function serves to both check the current number of references and to increment or decrement the counter. This increment and decrement mechanism lays at the heart of the ao2’s ability to clean itself up automatically once all references have been removed.
The scheduler routines provides a set of tools to create, manage and delete scheduled contexts and their items. Part of this is a number of macros that do some task-combining work to make it easier to ensure that the right things are done together in the right order. For instance the macro:
AST_SCHED_DEL_UNREF(sched, id, refcall)
exists to release the scheduled item then invoke the unref function if it was able to do so. This is typically called with an ao2ref -1 as the refcall so that if the release succeeds the macro will go ahead and decrement the refcount. This works great most of the time, but what if the schedule couldn’t be deleted because it was already executing? ASTERISK-29698 occurred because the function:
int ast_sched_del(struct ast_sched_context *con, int id)
which lays at the heart of our macro does not distinguish between a task that was released successfully vs one that was not released because it was already running. The result is that the macro would then invoke the unref – and risk doubly freeing the object as the scheduler automatically frees a non-rescheduled task once it completes. We still need to invoke the unref in the macro if the item is rescheduled (and therefore isn’t released) so we maintain the correct reference count, but in the case that the object is not rescheduled this could cause a decrement on a freed object.
This starts by adding a rescsheduled bit to the sched structure so that we can check whether or not it was re-scheduled.
/*! Indication that a running task was rescheduled. */ unsigned int rescheduled:1;
When Asterisk runs a set of scheduled events it checks the return status of the event’s call-back function. A call back returning 0 indicates that it wishes to be released. A non 0 return value indicates a re-scheduled item. Previously, if the call back returned 0 this would be enough to cause the scheduler to go ahead and release the item then and there. Now we look to see if the item was marked deleted by the sched_del, set the rescheduled flag and let that waiting thread take care of the release.
This passes the responsibility of releasing the item to the AST_SCHED_DEL_UNREF macro and underlying function, but as we said before, ast_sched_del doesn’t do everything we need. Instead we add a new function:
int ast_sched_del_nonrunning(struct ast_sched_context *con, int id)
that both takes responsibility for releasing the running scheduled item and returns a new -2 if called on a running, non re-scheduled item. This lets us change the AST_SCHED_DEL_UNREF macro to take advantage of this knowledge and avoid the refcall in that case. This sorts out the release between the macro and the underlying function and our bases appear to be covered. We also then modify ast_sched_del to call the new version so existing calls to it do not have to be changed.
So what does this have to do with abstraction?
In this case, because AST_SCHED_DEL_UNREF can be changed to use a new function and check for a new return value without changing it’s signature we don’t have to go update every call to the macro. Adding a new function and making the old version a wrapper around the new one encapsulates all of our changes to a few lines within the scheduler with no required changes elsewhere.
What else does this one tell us?
The scheduled item in this case is a ast_sched_context, the references to which are generally managed via ao2 containers. They also can get freed from within the scheduler, so any macros or functions in the scheduler need to be aware of this and not intentionally invoke an unref on an item it knows was cleared. Moving the release from the main event caller to the unref function allows ast_sched_del_nonrunning to handle the release and to signal back to it’s caller a better picture of how to proceed.
To see this all play out in motion, check out the test added to demonstrate the sequence:
Have a good one!