In Summary...
PiaF platform under test:
Low end system: P4 2.4 GHz dual core, 512 MB RAM, 80 GB hd, 2x Diguim TE410P 4 channels T1 cards, Xorcom astribank. System is idle with no active calls. Asterisk 1.6, CentOS 5.4 100 Mbps Ethernet card
Stress Test server:
DL380 G4 dual 3.4 Ghz quad core Xeon processors, 2 GB RAM, 36 GB RAID (small drives...), CentOS 6, SIPp3.2, Gigabit Ethernet card
Network:
Isolated for testing only, directly connected LAN with 100 Mbps Cisco 2900XL switch.
Testing:
Sequential SIP UAC test, no RTP payload. Call consisting of initiate call, call is answered on PiaF test server, then call is closed (hung up). The call rate was varied from 10 calls per second (cps) in increments of 10 cps until something broke (figuratively).
To put the cps rate in perspective, the testing was started at 10 cps which is the equivalent of 600 calls per min or 36,000 calls per hour which would be at an extremely large call center rate!
The SIPp WebFrontEnd application allows us to monitor the call rate, the number of successful calls and the number of failed calls. The CPU performance of the PiaF test server was monitored using FreePBX's System Status and the Linux command line tool "top". CPU % Utilization was the performance reference monitored.
Results:
How valid is this testing? It is not representative of real world loads as no RTP payload was sent. At this point it was used to stress test the hardware and software to see what would happen. It did bring up some unexpected services that loaded down the system and subsequent discussions resulted in system mods that dramatically increased performance.
Things that affected performance at these high call rates:
- FOP - Flash Operator Panel version 1. It chews up a lot of resources. Not recommended. It was disabled for testing.
- FOP2 - Flash Operator Panel version 2. Much better than FOP but still chews up resources. Before we blame FOP2, there is more to discuss as to how the Asterisk Manager Interface AMI works and its affect on performance. The AMI is the real culprit and not FOP2. This will be covered in another thread. FOP2 was disabled for testing.
- Asterisk Logging - This is the major source of poor performance. Two specific areas: log files and the asterisk CLI.
Solution to improve performance.
The biggest problem was the verbose messages being logged in the /var/log/asterisk/full log file and displayed in the asterisk CLI. This puts an incredible load on the system. It is compounded because fail2ban parsed the full log file. If the log file is large, then fail2ban has to parse a large text file every few seconds. This can put an incredible load on the server.
The solution is in /etc/asterisk/asterisk.conf, set "verbose = 0". This will stop the verbose messages from appearing in the log file and from appearing on the asterisk CLI. It will not stop warnings, debug, error and notice messages from appearing. Notices are used specifically by fail2ban so fail2ban continues to work.
If verbose messages are required for troubleshooting, they can be enabled via the asterisk CLI by issuing the command "core set verbose 7". Then verbose messages are shown on the CLI and logged to the full log file.
The performance results:
- Base system with FOP enabled: 20 cps
- Base system with FOP disabled: 60 cps
- Base system with FOP disabled and fail2ban disabled: 100 cps max
- Base system with FOP disabled, verbose=0, fail2ban enabled: 150 cps max
The last test showed an amazing 150 cps max load (9000 cpm, 540,000 calls per hour)! Now it becomes apparent how simple logging can affect performance.
Recommendations:
Disable verbose messages from a production server unless needed for troubleshooting.
On a side note, several FreePBX modules were disabled to see if the modules would affect performance. None did.