Page Not Found

Overview

If you’re familiar with Asterisk, you probably know that it uses a third-party project called pjproject. This is a major part of the PJSIP channel driver. When you have third-party projects integrated into your own, it’s your job to stay up to date on new changes and ensure that your project is in a stable state. For the most part, this is a fairly painless process for Asterisk and pjproject. There are occasions where this is not so easy, however. In a recent update, pjproject was upgraded to version 2.14. The process took longer than normal though due to issues that arose with the new changes. This blog post will shed some light on what obstacles we faced and the importance of having tests to make you aware of changes you might have otherwise missed.

The Problem

First, let’s talk about what problems we actually encountered. When testing pjproject 2.14, we noticed that a few tests in the testsuite were consistently failing:

  • tests/channels/pjsip/basic_calls/outgoing/off-nominal/call_canceled
  • tests/channels/pjsip/reinvite_after_bye
  • tests/channels/pjsip/rel100/incoming/peer_supported_require
  • tests/channels/pjsip/rel100/incoming/peer_supported_used
  • tests/channels/pjsip/transfers/blind_transfer/off_nominal/transferer_reinvite

The problem was not immediately obvious; events we expected to occur were not occurring, and sipp reported a failure. Upon further examination, we discovered that extra responses were being sent, and sometimes responses would be sent in an order we did not expect. Since the tests were written to expect responses in a certain order, this was causing failures. Similarly, the tests were not accounting for these extra responses, so sipp would fail there as well.

What Changed?

pjproject committed a couple of changes that introduced this behavior. The order in which PRACKs are processed was changed, which is what was causing the rel100 tests to fail. Another commit changed on_media_update to be called for error scenarios as well, which is what was causing the extra responses to be sent (e.g., multiple BYEs). Whether this was considered valid behavior or not, it’s not what the testsuite considered passing.

How Did We Fix It?

You may be thinking that this is an easy fix; just change the events in the tests to match what we’re receiving. However, there are a few problems with this. We needed to verify that the new behavior in pjproject was actually valid. This resulted in going through multiple RFCs to try to find a source to validate the behavior.

The testsuite is also tied to specific branches of Asterisk, but it is NOT tied to specific releases in those branches. If you were to go from Asterisk 21.0.0 to Asterisk 21.1.0, there is no guarantee that something will behave the same way under the hood. The test has to be able to pass on both versions.

Validating the Changes

After doing some research, we found that it doesn’t matter what order the PRACKs arrive for the tests involved. The extra responses we were getting in the form of BYEs and CANCELs were also considered valid behavior. This meant that the tests had to be backwards compatible. The rel100 tests were changed so that they did not look for specific PRACK responses. It only cares that it received two. The other tests were more complicated and now have branching depending on if it received a response at a certain point in the test or not.

Overall, this was good news because it meant that pjproject did not have to do any regression releases, and we didn’t have to make any additional changes to Asterisk. Only the testsuiite required a patch.

What can we help you find?