If you’re running an Asterisk “farm” of virtualized PBXs and your virtualization environment is dedicated to it, you probably already have your environment pretty well tuned. You know the total resources available and the VMs are pretty much all the same. You may even have a nice formula for calculating the resources required to handle a specific number of seats or calls, how many VMs you can place on a single host, etc. If however, you’re running your company’s corporate PBX in a virtualized environment that’s shared with the rest of the company, you have a much tougher job because you may know nothing about the environment or what you’re sharing with.
Having said that, lots of folks run Asterisk successfully in a virtualized environment but there are some things to be aware of, mostly concerning resource allocation.
Oddly enough, CPU starvation is the least of the issues we run across but it does happen. Most hypervisors have at least two “knobs” to tune when it comes to allocating CPU resources among VMs:
- Virtual Cores: This is the most obvious tuning parameter but it’s also somewhat misunderstood. Assigning 10 cores to your Asterisk VM does NOT mean that the VM has full use of those 10 cores. For most hypervisors, it simply means that’s the number of cores the VM’s operating system is told it has. They’re, well, virtual. 🙂 The OS can schedule threads among those virtual cores but that doesn’t mean that they’ll actually get cycles to run.
- CPU Shares: This is the real tuning knob and most hypervisors have something similar. It determines how many cycles a VM’s virtual cores get compared to virtual cores of the other VMs running on the same host.
Let’s take a simple example (I’m going to try and use vendor neutral terminology for this). If you have two VMs on a host (and no others), each with two virtual cores and a default setting of 1000 CPU shares, then each VM will get 2000 shares (1000 * 2) out of a total of 4000 (VM1 + VM2). If you increase VM2’s shares to 2000, then VM1 will still get only 2000 shares but now VM2 will get 4000 (2000 * 2) shares out of a total of 6000. If both VMs were originally running CPU intensive tasks that used most of the available resources, increasing VM2’s shares could potentially starve VM1 of CPU cycles. If VM1 was the Asterisk VM you can expect choppy audio complaints at the very least. If you add a third VM with 2 virtual cores and 1000 shares, the Asterisk VM is now getting only 2000 shares out of 8000. The bottom line is that shares isn’t an absolute number because the total pool of shares keeps growing as new VMs are added. If you don’t control the number of VMs on a host, the hypervisor admin could accidentally tank the PBX just by adding a new VM. Luckily every hypervisor has resource allocation policy settings that can help make sure this doesn’t happen. Make sure you understand them.
There’s one other CPU related issue that comes up every once in a while in both virtualized and bare-metal environments and that is … running Asterisk as a real-time or “nice” process. Let me be clear: It’s almost ALWAYS a bad idea! Conceptually, Asterisk is a real-time application in that it must process requests with very tight time constraints but that doesn’t mean that the operating system should treat it as a real-time process. Simple scenario: You have Asterisk running as a RT process. You have a local caching name server on the same host (which is a good idea). Asterisk is running “hot” processing media and SIP traffic and makes a DNS query. The poor name server, which has a lower priority than Asterisk, gets starved of CPU cycles and can’t service queries as fast as Asterisk is sending them. Threads in Asterisk are now hanging waiting on DNS answers but that doesn’t stop the requesting from coming in which generates even more DNS queries. Well, you get the idea. The same situation can occur for any other process on the host that Asterisk relies on… database server, ARI infrastructure, etc. In a virtualized environment, setting a higher priority for a process generally does NOT mean that your VM will get more CPU shares to handle it so you’ll still be constrained to what’s available to the VM.
This one’s a little more straightforward. Each host in a cluster usually runs a virtual bridge/switch to aggregate the network traffic for all of its VMs. If your hosts are blade servers, then the blade chassis probably also has a switch, then there’s usually a switch or router for the rack, etc. If your company’s Asterisk PBX is running in a VM on the same host as a high bandwidth application such as a video server, expect trouble. Judicious use of COS/TOS/DSCP can help here but only if your network elements respect the settings. The internet in general does not.
I mentioned above that running a local caching name server was a “good idea” and, properly configured, it still is. Slow responses to DNS queries can knock Asterisk to its knees. In fact, it can make Asterisk appear to be deadlocked as queries pile up.
We encounter storage issues more than any other, mostly because there are so many different implementation possibilities.
- VM filesystem on a dedicated host-local block device
- VM filesystem on a shared host-local block device
- VM filesystem in a dedicated LVM local volume group
- VM filesystem on a remote iSCSI device
- VM filesystem on a SAN based volume
- etc, etc, etc.
If you’re using a database backend for Asterisk configuration there are even more combinations:
- Asterisk connects to a local database engine in the same VM whose backing store is on any of the above VM filesystem configurations.
- Asterisk connects to a remote database engine which could be a VM using any of the above configurations or bare-metal.
- The remote database engine could be dedicated to Asterisk or shared with other applications.
- etc, etc, etc.
With so many different possibilities, there isn’t a clear “best” configuration, just some things to consider:
- Slow filesystem/database access can make Asterisk appear deadlocked as requests pile up.
- Call recording and voicemail activity are obviously going to cause lots of filesystem writes.
- Each incoming SIP registration results in a database update, even if it’s just to the sqlite3 database in /var/lib/asterisk/astdb.sqlite3.
- If you’re using a database configuration backend, look at using sorcery caching.
- The default maximum number of open ODBC connections to a remote database is 1 but just setting it to 100 arbitrarily is probably not a good idea.
- If the network is involved in either filesystem or database access, make sure you understand what you’re sharing it with. If it’s the same network as the video server I mentioned above, you can expect issues in terms of both poor call quality and call processing delays.
- The iotop and iostat utilities can give you a lot of information about filesystem I/O perfrormance.
If you were looking for simple “Here’s how to do it” instructions, sorry! So much depends on your specific environment that it’s impossible to provide anything other than guidance but I do hope this helps. Please share your experiences with virtualizing Asterisk in the comments!