One of the most difficult things in PJSIP is ensuring that the experience is the best it can be for not just people who configure their Asterisk from normal configuration files but also from a database. This presents quite a challenge and one of the areas that has been problematic has been qualify support. Qualify support is problematic because it inherently has to be stateful. You need to know what to qualify, what configuration to use, what their current state is (are they reachable or unreachable), and the time it takes the qualify to get a response. In the past tweaks were done to improve this for database users but unfortunately they caused problems for users of configuration files. As of the latest releases of Asterisk (13.22.0 and 15.5.0 as I write this) this has been improved. So let’s take a look at how things were and how they now operate!
Query, Query, Query
When qualify support in PJSIP was initially written it was not done with a focus on the database usage aspect. It was done for the configuration file use case. It hooked itself in to know when things came into existence and was driven by this. In the case of database access this caused some things to get lost, resulting in no qualify occurring. Later changes were done to query more often to know the state of things. Unfortunately querying comes at a cost and this added up, even in the configuration file case, causing qualify support to take a long time to work under a large number of endpoints. On a system with 1000 AORs it might take over a minute for Asterisk to start as the qualify support figured out everything. Even after Asterisk was started the CPU usage would be high because of the amount of querying done as the qualify results came back. This had to change.
One weekend I took all of the ideas and thoughts that had been popping into my head about rewriting qualify support and actually turned it into code. My primary requirements were that qualify support should be better than it is now, work with both configuration files and the database, and maintain compatibility on any facing interfaces (such as AMI). To help me I came up with two primary drives.
Don’t Be Afraid Of State
The previous qualify support suffered from not maintaining state. This was partially because it grew organically without a long term design, but also because maintaining state is hard and hinders database users. For the new implementation I knew this had to change. If qualify support was to be fast and less CPU hungry it had to maintain some state, so now it does!
The new qualify support maintains state information about what AORs (and the contacts associated with them) it is qualifying (or not qualifying) and the endpoints interested in them. This means that no querying is needed at qualify time to know this association resulting in it being substantially faster. To support database users, though, this information can be pulled from the database if it has not been seen yet. This is done automatically. For cases where the backend database configuration has changed this will need to be updated using a reload. I feel this is a small price to pay for having qualify support that doesn’t use an excessive amount of CPU.
Filter And Aggregate
Conceptually qualify support is a stack with lower levels feeding higher ones. You see if a contact is reachable, you inform the AOR which may change its state (remember – an AOR can have multiple contacts), which then informs any interested endpoint (an AOR can be associated with multiple endpoints). This was not previously fully represented or utilized in code but now is.
The result of qualifying a contact is provided to the AOR. The AOR aggregates this information with the other contacts on it. If the AOR state is not changed then nothing happens. If the AOR state has changed then this information is provided to all interested endpoints. The endpoint aggregates this information with the other AORs on it. If the endpoint state is not changed then nothing happens. If the endpoint state has changed then this is raised to the rest of the system.
Since we now filter and aggregate at each level this ensures no needless extra calculations are done to determine if anything has changed.
I was going into rewriting qualify support not quite knowing what the end result would be but I’m quite happy to say that the performance is substantially better. Previously on my system with 3000 AORs qualifying I would give up waiting for Asterisk to load after 3 minutes. With this new support it loads within a few seconds and qualify support consumes approximately 15% CPU while qualifying them and dealing with all of the traffic. The other nice thing is that the public facing interfaces were preserved, so most individuals would never know such a thing changed.
Have you noticed the change? Let me know and if you’ve experienced a problem don’t forget to file an issue on the issue tracker.