How much testing do we really do?
A lot, and let’s face it, Asterisk isn’t the easiest of software packages to test. Our continuous integration environment could run over 1000 tests on a single change before it’s merged into the codebase. Unfortunately, without a significant amount of work, mostly around directory and port coordination, they can’t be run in parallel in the same operating instance and when run sequentially, a single run of 600 tests can take the better part of a work day. Given that most changes submitted to Asterisk’s Gerrit affect 3 branches, and that the Testsuite must be run for each branch before a change is merged, it was taking unacceptable amounts of time to get changes cleared through the process.
Earlier in the year, we created a new VM cluster and started breaking the Testsuite tests up into several clumps that could be run in parallel on several VMs. We then configured Jenkins to coordinate the parallel jobs and report back to Gerrit with a final vote. This has worked better but now we’re running into other issues. We broke the tests up into 5 Jenkins jobs, but now for a change that affects 3 branches, that’s 15 jobs (15 VMs) running in parallel. Given that the VM cluster doesn’t have unlimited resources we’re still having an issue responding back to Gerrit (and the change author) in a timely fashion when there are multiple changes in the queue and it’s just not practical to keep adding resources to the pool when there are alternatives.
The nice thing about the Testsuite is that it doesn’t take a lot of resources to actually run the tests. As mentioned above, the constraint when attempting to run parallel Testsuite runs is on the network and filesystem. This is a perfect use case for running the tests in containers, specifically Docker, on the same host. Even better is that we can build Asterisk only once and have the tests run in parallel against those build products. Even better than that, Jenkins has built-in facilities to help us with container management.
There are some hurdles though:
We test Asterisk on CentOS, Fedora and Ubuntu and we have scripts that can take the official base x86_64 images of those distributions and turn them into full Jenkins-ready Asterisk development environments. There are no official i686 base images however and we do test in 32-bit environments as well as 64-bit. To create the images, we had to do actual installs of 32-bit CentOS7 and Ubuntu14, clean them out, tar the root file systems, then create Docker images. Unfortunately that’s only half the battle. A Docker image’s architecture isn’t determined by the packages installed in the container. It’s determined by the architecture of the host. If you run a 32-bit CentOS 7 image on a 64-but CentOS 7 host, then run ‘uname -m’ in the container, you’ll get ‘x86_64’ not ‘i686’. If you try to build and/or run Asterisk in this situation, things are going to get messy. Luckily you can use the ‘setarch’ command to control what ‘uname -m’ returns but to use that, we had to put logic in our Dockerfile creation scripts and the scripts that actually build and run Asterisk.
- Understand the Docker-Jenkins Relationship:
This is a fairly complicated relationship, mostly due to the serialization needed for Jenkins to maintain state and data across multiple slaves. When using the Jenkins Pipeline DSL, we had to be very aware of what Jenkins commands will be executed in the Docker container vs on the Docker host (Jenkins slave). For instance, using the built-in Jenkins ‘git’ tools will cause them to always be run on the host, not the container but if you run ‘git’ in a Jenkins ‘shell’ command, it will run in the container.
- Divide and Coordinate the Workload:
Much experimentation was needed to split the total number of Asterisk Testsuite tests into equal length chunks (by execution time) but most of the work was around coordinating the git checkout, asterisk build, asterisk install and testsuite runs. The only way to keep both the time and resource load down was to checkout and compile once in one container, then run parallel containers, one for each test chunk, that installed the build products from the first container then ran the Testsuite for a specific subset of tests. Jenkins then waits for all the containers to complete, then notifies Gerrit of the result.
So did it work?
Well, yeah, kinda. By compiling once then parallelizing the tests into 10 containers, we’ve been able to get a single Gerrit gate (Jenkins job) to run in about 30 minutes on a single modest VM. Even better, we can now run more than 1 Jenkins job on the same VM so now we have more levels of parallelism: Jenkins can manage multiple slaves (VMs) of course, each slave can now run more than 1 job at a time, and each job can now run more than 1 test at a time. So why the “kinda”? We’ve found about a dozen tests that fail consistently when run in a container that don’t fail when run in a VM. We’re not sure why yet and we’re still investigating.
When does it get rolled out?
We’re working through those failing tests now and have already fixed many ARI tests that were actually a result of a reference leak in Asterisk. When the rest are addressed (over the next few sprints), we should be able to move the public Jenkins instance to the new architecture.
Once we’ve rolled out the new architecture, we’ll publish the details of how and what we did on the Asterisk Wiki.
I’ve been running asterisk in production for over a year now. Getting asterisk to run correctly in Docker is a huge learning curve. If you want to compare notes about those last few tests failing, maybe I can help.
I mean in docker in production for over a year.
If it runs in a VM, but not in Docker, one possibility is your Docker container is not set up right. Perhaps the needed resources from the OS have not been allocated properly. Given it runs for a short time, I would also suspect a memory de-allocation issue in one of the modules within Asterisk, A full OS may handle the possible bug differently. I would monitor the resources used in the docker program and look for any resource declining.
Thanks for the tips Robert! To make sure every thing is set up correctly, we use the exact same scripts to set up the images as we do to set up virtual machines with the exception of things like hypervisor support packages, etc. Normally we run VMs with 8gb of memory but the Docker hosts actually run with 32gb. We’ll look closer though.
Thanks Michael! We’re pretty sure the issues are timing related in the tests themselves. Some of them have multiple state machines running at the same time and the coordination between them is not all that robust. It’s not uncommon for some of them to fail even on different VMs. We’re looking at them more this week.
Four months passed. Any updates???
We’ve been so busy over the summer that we had to put this on the back burner for a bit. Now that things have calmed down we’re picking it back up again. Sorry for the delay!
I would be delighted to hear from that! How can I reach you?
Another four months have passed.
Do you have good news by now?
We’re getting close. By the end of the month we’ll at least have made public the process we use to create the containers and the jenkins scripts used to execute the tests.
Any status on this? We are trying to get Asterisk up and running in a container ourselves and so far we’re not having a lot of luck.
Any news on this? Support for dockerized Ubuntu?
Need to do a setup over Docker on Ubuntu, any news on this topic?