The Continuing Saga of Continuous Integration

George Joseph

Home > Blog > The Continuing Saga of Continuous Integration

The Continuing Saga of Continuous Integration

or… “Watch This Space”

If you’re an Asterisk contributor you’ve probably noticed that we’d been having issues with large numbers of Jenkins test failures during the “gate” phase of the Gerrit review process. Some tests were failing consistently and others seemed random. After a lot of head scratching we finally figured out the major contributor to the failures. tl;dr; It was the /tmp filesystem. To understand how this was affecting the tests, you have to understand the intricacies of the Asterisk Testsuite and the virtualization environment the Testsuite runs in.

If you’ve had the pleasure of working with the Testsuite, you know that pretty much every test is timing dependent. SIP packets and AMI events have to be received in the expected order and within the expected time frames for a test to pass. The Testsuite is also more disk I/O intensive than most people realize as it’s constantly writing temporary config files, log files, starting and stopping Asterisk and sipp, etc. For this reason, the availability of disk I/O bandwidth can have a big impact on ordering and timings.

While CPU and memory distribution is a snap to tune in most virtualized environments, disk I/O is one of the hardest things to tune. You can have the fastest SSDs on the planet but if an application has to go through 27 layers to get to it, it won’t matter. In our case, this was the issue.

Here’s what we had…

Docker Container
- Asterisk Testsuite
  - /tmp on ext4
    - Docker Host (virtual machine) btrfs filesystem
      - oVirt/libvirt virtio-scsi driver
        
        oVirt/libvirt VM host
        
        QEMU QCOW2 disk
        
        Gluster distributed filesystem
        
        20G dedicated storage network
        
        VM Host 1
        
        Host xfs filesystem
        
        VM Host 2
        
        Host xfs filesystem
        
        VM Host 3
        
        Host xfs filesystem

OK, it’s not 27 layers but it’s still way too many. With that arrangement, we consistently had 15-20 test failures per gate.

Here’s what we have now…

Docker Container
- Asterisk Testsuite
  - /tmp on tmpfs (memory backed)

Surprise! The bulk of the test failures went away. In fact, about half of the gates have no failures at all and are now auto merging and the ones that do fail usually have less than 5 test failures.

We’re not out of the woods yet. As mentioned, there are still some chronic test failures but we’re taking hard looks at them to see if they’re environmental or just temperamental and need to be re-written to be more tolerant.

About the Author

George Joseph

See All of Author's Posts

The Continuing Saga of Continuous Integration

The Continuing Saga of Continuous Integration

or… “Watch This Space”

Leave a Reply

About the Author

George Joseph

The Sangoma Blog

What can we help you find?

Download Asterisk

Get Started

Other Resources