Manager is an API that has been in Asterisk for almost as long as Asterisk has existed. It provides a mechanism to see what is happening in Asterisk and to also cause things to happen. While the public facing API hasn’t undergone any radical changes since it came into existence the way it works internally has changed. It’s gone from an API that is intrinsically ingrained into the core and other parts to one that relies on queued generic messages that are then converted into AMI formatted messages. These generic messages are the same ones that are published to the Stasis message bus (discussed in previous blog posts). Manager has become a layer between Stasis and the outside world.
To function, manager creates an aggregation feed of all the interesting parts of Asterisk and subscribes to it. This provides a handy, serialized stream of messages about what is happening. Manager then takes these messages, converts them into the AMI format, and ensures they get to each session they should. This doesn’t seem out of the ordinary, but we received reports that the Stasis subscription it uses was getting backed up with a lot of pending messages. Not being able to easily explain what was going on, we sought to dig in and see what was up.
The primary idea, that many of us had, was that the messages that were being sent to each AMI session from the same thread, that handled the Stasis subscription, were causing a slowdown. After looking into things, this is not actually the case. Manager maintains a single shared queued of messages. Each session iterates through the queue at its own pace. Once all sessions are past a message, it is then freed and its resources released. This shared queue was done to reduce memory usage and work, as the information about its position in the queue can be held within the message itself. With each session instead having its own list, one would have to allocate additional data for each to store the message.
With the first idea not yielding any result, we dug in some more and did not immediately come up with any additional ideas to explain what we were seeing. It was only after a second look at the overall locking in Manager that an explanation presented itself. While the subscription thread was not sending the AMI messages out to each session, it was notifying the sessions that a new message was available. This action required holding the same lock that was held when doing the bulk of the work in the session thread; thus, blocking the subscription thread.
With an explanation found, we worked to resolve the problem. A new lock has been added to Manager, strictly for notifying the session that new messages are available. This means that the subscription thread is now minimally blocked on each session and can process things quickly. A single session being slow to send messages or process incoming messages will not impact any other sessions or the handling of messages.