TUTORIAL PIAF Redundancy Solution

imcdona

Guru
Joined
Mar 28, 2008
Messages
13
Reaction score
0
Hello,

I have been working on a few scripts to create a redundant pair of PIAF servers.

Here is how it works:

1. MySQL data is stored on a separate MySQL cluster

2. A cron job runs every 30 mins that rsync's the pertinent directories. In my example, it is also copying cepstral and swift. The script also modifies amportal.conf to point to the MySQL cluster.

As it stand now, I can bring up a fresh copy of PIAF, run the scripts and have the box up and running as a mirror of the primary in a matter of seconds.

You'll notice that I have a crude hack in place to remove the sip registrations from the backup server. The solution of course is to have my DID provider point to a SIP SRV record so DNS takes care of the issue. Currently VoicePulse only allows registrations.

It seems to work ok fine so far. I obviously need to implement error checking on the scripts as well.

If anyone is interested please have a look at:

http://blog.voicebyip.com

What are your thoughts? Is there a better way to go about this? Can you see any potential problems with my solution?

Thanks,

Isaac
 

jroper

Guru
Joined
Oct 20, 2007
Messages
3,832
Reaction score
71
Hi

That's nice.

MySQL cluster server is a bit of overkill, as you mentioned, but the your mention of replication sounds intersting, maybe coupled with a php script to check that the MySQL server on box A is still alive.

The other way I had considered would be to get FreePBX to send do a mysqldump everytime that a reload was exectued, and pass it to a remote serer, because this only changes when the box's configuration changes.

From the mysql tables, you know what modules are loaded, so from a fresh install, you could download the correct modules from Freepbx.org and restore the box to its previous state.

The main difference between this, and the standard freepbx backup is that you are only shifting a relatively small amount of data, which can be passed over the bradband without affecting voice quailty everytime you do a config change, and you always get the latest backup. and stored remotely, I think the asterisk database when zipped up is only a few Kb.

The remaining stuff on the server can be approached in a different way.


This only leaves voicemails, which if you are emailing them out, don't need to be worried about, and CDR, which many companies can do without in an emergency.


Having said all this, from a commercial stand point, wanddering into a customer with 2 PBX's under your arm is not likely to engender confidence - panansonic would not bring in 2 boxes. I think a system of rapid restore and or cloning is a more appropriate strategy for the commercial world.

Joe
 

imcdona

Guru
Joined
Mar 28, 2008
Messages
13
Reaction score
0
Joe,

I appreciate your feedback. You bring up some good alternatives.

I think you are mistaken though in regards to having just one tricked out box as opposed to two boxes.

In the past if you walked into an office with two solid state telephony devices (not servers) I am sure it would raise a few eyebrows. The fact of the matter is, telephony is not about hardware anymore. Gone are the days of a line card for this, a card to record announcements on etc. The industry is changing.

I had the opportunity to manage an Avaya S8710 PAIR. And you know what? It was connected to a G710 with line cards that provided such "advanced" functions as the ability to record an announcement (A VAL board).

I am not kidding when I say I was confused. I didn't get the logic of it all. I remember asking my Avaya sales rep if he certain there was no other way to record a sound file aside from buying a VAL board. The rep told me he was sure. I STILL didn't believe him. I was seriously convinced he was trying to pull a fast one on me to pad his commission.

It wasn't until I thoroughly researched it that I realized that yes, you actually have to buy a card that stores about 20 minutes worth of sound files. Bear in mind, the S8710 series systems are running CM on Linux. I was shocked that these two servers had gigs of free space and I had to purchase a "card" to store a sound file?

When my boss tasked me with deploying an international telephony solution utilizing our Avaya system I called my sales rep and got a quote. He sent hardware list a mile long and a price tag that I could afford to live off of for year or two. I called him back and told him to send me an Avaya SIP server.

I installed the SIP server and had all DID's globally pointed to the SIP server. I then deployed naked IP phones at each office and called it a day. I saved on hardware, administrative overhead and not to mention the nightmare of managing phone bills from various providers throughout the world.

Avaya is not all bad though. Theymade an announcment a while back that they are changing their focus from hardware based company to a software based company.

You tell me what is more ridiculous, walking into an office with 2 rack mount 1u servers to support 30+ offices globaly or unloading a semi worth of proprietary Avaya hardware that not only has to be installed and maintained but also purchased at outraguous prices. I personally perfer the former.
 

darmock

PIAF Developer
Joined
Oct 18, 2007
Messages
2,892
Reaction score
98
In fact I know what Joe means when you go in to the smb market with 2 servers under your arm.... alarms go off. The mindset is very different between a small client and a medium to large client. To solve this on the larger SMB I generally use a dual system 2U rackmount for redundancy and a proprietary hot swap solution that I developed for my own use. (Sorry it isn't going to become part of PIAF) This costs a bit more but the client only ever sees the 1 "computer" and seems to be happier in the long run.... A bit of jiggery pokery I know but it helps with sales.

In medium to large business I have found they won't even look at your response to their rfp if there is not a redundancy system in the quote.


Tom
 

wardmundy

Nerd Uno
Joined
Oct 12, 2007
Messages
19,206
Reaction score
5,229
You've raised some great points. We've had redundant PIAF servers for over a year now, but I've been reluctant to roll this out. However, now that you can buy two of the little guys below for under $700 TOTAL, I'm reconsidering.

The way ours works is to copy a one byte file from one device to the other every few minutes. We call it PMFT™ (poor man's fault tolerance). If it doesn't get the file, it assumes the server is dead. It then resets its own IP address to the master address, unblocks the outbound ports in its firewall, and reboots.

All of the SIP phones come right back up when the new server takes over. We, of course, plan to get rich off this design :lol: but I'll publish it on Nerd Vittles in a few weeks just to be sure the donations come pouring in. :lol::lol::lol:

Photo below is ACTUAL SIZE:

gPCmini.jpg
 

The Deacon

Guru
Joined
Jan 29, 2008
Messages
296
Reaction score
14
The company I work for does quite a bit of MySQL replication; consequently, I have been working on a how-to article to do MySQL replication for PIAF. if anyone is interested, I can post it later today.
 

wardmundy

Nerd Uno
Joined
Oct 12, 2007
Messages
19,206
Reaction score
5,229
By all means, please post. Thanks! Maybe you'll get rich, too. :wink5:
 

jroper

Guru
Joined
Oct 20, 2007
Messages
3,832
Reaction score
71
Hi

You tell me what is more ridiculous, walking into an office with 2 rack mount 1u servers to support 30+ offices globaly or unloading a semi worth of proprietary Avaya hardware that not only has to be installed and maintained but also purchased at outraguous prices. I personally perfer the former.

30+ offices - yes make that at least 2 servers if not 3. My mindset was more focused on the smaller installations.

Joe
 

imcdona

Guru
Joined
Mar 28, 2008
Messages
13
Reaction score
0
Deacon please post! With any luck, between Joe, Warmundy myself and you, we might actually come up with a solution worth including in the next PIAF release. That would be sweet!
 

The Deacon

Guru
Joined
Jan 29, 2008
Messages
296
Reaction score
14
MySQL Replication

There are several ways to implement replication under MySQL (Master -> Slave, Master -> Master, Master -> Relay -> Slave) in addition to just replicating a single, several or all databases between the hosts.

This how-to will walk you through the process of setting up MySQL database replication in a Master -> Slave environment that will replicate ALL databases (and their respective entries) from the Master to the Slave.

In an upcoming how-to, I'll walk you through getting everything back to normal after a server crash, but this how-to only focuses on getting MySQL replication working between the Master and Slave servers. Additionally, while this how-to is written specifically for MySQL 5.0.x running on CentOS 5.2 on a PBX In A Flash (PIAF) server, you could probably make this work on a server that isn't running PIAF. But there are no guarantees. With most things technical, there is no warranty expressed or implied. Use at your own risk. You have been warned. :biggrin5:

This document makes several assumptions, so let me get those out of the way first. To start, it is assumed that you already have two (2) functional PIAF boxes configured, running and communicating with each other (whether they are on the same subnet, or via hamachi - it makes no difference, so long as they can see each other). It also assumes that you know how to log into the MySQL client, use ssh as well as use some sort of editor (joe/vim/vi). If you don't have all of those prerequisites, stop right here until you have all the parts. I'll wait. :smile5:

Ok, since we have 2 servers, you need to decide which server is the main (or Master) server and which server will be the backup (or Slave) server. The Master server will keep a log of each and every transaction (add, delete, modify) that happens in that database. The Slave will look at the log on the Master server and whenever any changes happen on the Master, it will also make those changes happen on the Slave.

Since the Slave connects to the Master using a standard MySQL username/password, there must be an account on the Master server that the Slave can use to connect with. Any account can be used for replication, just be aware that the username/password used for replication will be stored in plain text within either the my.cnf or master.info file(s). Personally, I find it's easier (not to mention cleaner and safer) to create an account specifically for replication; that way if the account is compromised, the account only has the privilege of performing replication.

Given that, we're going to create a user (called "repl" with a password of "passw0rd") and grant the privileges required for replication, using the GRANT statement. Please feel free to change these to whatever you want.

Log into your MySQL client as the root user and issue the following commands:

Code:
GRANT REPLICATION SLAVE ON *.* to 'repl'@'%' IDENTIFIED BY 'passw0rd';
FLUSH PRIVILEGES;
QUIT
Now that the "repl" user has been created, let's configure the Master MySQL server to create/keep the log to store the changes that Slave server will "feed" from.

Edit the /etc/my.cnf file and in the [mysqld] section add these two lines:

Code:
log-bin=mysql-bin
server-id=1
The first line is the "base name" of the log file that MySQL will use (mysql-bin.000001, mysql-bin.000002, etc)
The second line assigns the Master server an ID (used by replication) to distinguish it from the Slave server.

Now we need to dump the databases that we have on the Master server to import onto the Slave server. We dump the databases by doing:

Code:
cd /tmp
mysqldump -u root -p --all-databases --lock-all-tables > dbdump.sql
We now need to scp the dump file to the Slave server:

Code:
cd /tmp
scp dbdump.sql [email protected]:/tmp
Once you've got the mysql dump is done, let's restart the mysqld service on the Master server:

Code:
/etc/init.d/mysqld restart
SSH into the Slave server and import the database with this command:

Code:
mysql -u root -p < /tmp/dbdump.sql
On the Slave server, edit the /etc/my.cnf file and in the [mysqld] section add these four lines:

Code:
master-host=111.222.333.444 (replace this with the IP address of the Master server)
master-user=repl
master-password=passw0rd
server-id=2
Once you're done, let's restart the mysqld service on the Slave server:

/etc/init.d/mysqld restart

Log into your MySQL client on the Slave server and issue the following commands:

Code:
SLAVE START;
SHOW SLAVE STATUS\G
You should see something similar to this:

Code:
*************************** 1. row ***************************
             Slave_IO_State: Waiting for master to send event
                Master_Host: 111.222.333.444
                Master_User: repl
                Master_Port: 3306
              Connect_Retry: 60
            Master_Log_File: mysql-bin.000001
        Read_Master_Log_Pos: 98
             Relay_Log_File: pbx-relay-bin.000010
              Relay_Log_Pos: 235
      Relay_Master_Log_File: mysql-bin.000001
           Slave_IO_Running: Yes
          Slave_SQL_Running: Yes
            Replicate_Do_DB:
        Replicate_Ignore_DB:
         Replicate_Do_Table:
     Replicate_Ignore_Table:
    Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
                 Last_Errno: 0
                 Last_Error:
               Skip_Counter: 0
        Exec_Master_Log_Pos: 98
            Relay_Log_Space: 235
            Until_Condition: None
             Until_Log_File:
              Until_Log_Pos: 0
         Master_SSL_Allowed: No
         Master_SSL_CA_File:
         Master_SSL_CA_Path:
            Master_SSL_Cert:
          Master_SSL_Cipher:
             Master_SSL_Key:
      Seconds_Behind_Master: 0
1 row in set (0.00 sec)
There are three fields that we are concerned about: the Slave_IO_Running, Slave_SQL_Running and the Read_Master_Log_Pos fields.

The Slave_IO_Running service is the service that maintains communications, as well as moving the log file entries between the Master and Slave. If the Slave_IO_Running service isn't running (or says "No"), the Slave can't see/communicate with the Master.

The Slave_SQL_Running service is the actual service that takes the log file entries that have been copied over to the Slave and makes the changes to the MySQL database. If this service isn't running, check the /var/log/mysel.err log for more information.

The Read_Master_Log_Pos field is really nothing more than the position in the file that the Slave is current with. To see if the slave has caught up with the Master, do the following:

Log into your MySQL client on the Master server and issue the following commands:

Code:
SHOW MASTER STATUS;
You should see something very similar to this:

Code:
+------------------+----------+--------------+------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+------------------+----------+--------------+------------------+
| mysql-bin.000001 |       98 |              |                  |
+------------------+----------+--------------+------------------+
1 row in set (0.00 sec)
The Read_Master_Log_Pos on the Slave server should match the Position field on the Master server. If they differ, it means that the Slave has some catching up to do. Repeat the steps again in a minute or two; if you have a LARGE amount of data, it could take up to 5 minutes (or more) to make the Slave in sync with the Master.

Contratulations, you have just set up database replication between the Master and the Slave.

To really see the replication in action, log into your MySQL client on the Master server and issue the following commands:

Code:
CREATE DATABASE DELETE_ME1;
SHOW DATABASES;
Now, log into your MySQL client on the Slave server and issue the following commands:
Code:
SHOW DATABASES;
You should see the database called DELETE_ME1 listed as one of the databases on the Slave server.

Back in the MySQL client on the Master server issue the following command:

Code:
DROP DATABASE DELETE_ME1;
Back in the MySQL client on the Slave server issue the following command:

Code:
SHOW DATABASES;
The DELETE_ME1 database should be gone.

I hope this tutorial was helpful. If you have questions, please post them below and I will do my best to answer them.

-Rick

Orgasmatron 5.x / Incredible PBX UPDATE: Chances are, you will will need to add these two lines to your /etc/sysconfig/iptables file on both the master and the slave (this will allow the two boxes to see each other via the MySQL replication port):

Code:
# Allow connections to our MySQL server
-A INPUT -p tcp -m tcp --dport 3306 -j ACCEPT
Once those lines are added, you'll need to restart iptables:

Code:
/etc/init.d/iptables restart
 

gomonn

Member
Joined
Nov 16, 2007
Messages
56
Reaction score
0
Thank You so much for sharing guys .... That is the type of thing I have been waiting for (don't know much about programming ...)
 

imcdona

Guru
Joined
Mar 28, 2008
Messages
13
Reaction score
0
Thanks for everyone who took some time to provide feedback.

I'm going to make a few changes to my process based on all the feedback. I'll post the new and improved redundancy process shortly.
 

unison

New Member
Joined
Mar 19, 2009
Messages
5
Reaction score
0
Any more developments on this front

We are currently looking at how to implement redundant PIAF servers
 

jroper

Guru
Joined
Oct 20, 2007
Messages
3,832
Reaction score
71
I've been looking at a different appoach to replicating databases, which I'm guessing is the same as Ward's

See these links:-

http://www.linux-ha.org/DRBD
http://www.drbd.org/
http://support.red-fone.com/downloads/elastix/Elastix_HA_Cluster.pdf
http://www.voip-info.org/wiki/view/TrixBox+High+Availability+cluster+using+drbd

This is an intersting approach, as the failover is seamless, taking place is in a few seconds, and one person I spoke to about the solution said that the call is not even dropped, just a 3 second silence.

The essence is that a disk partition is constantly replicated over an ethernet cable (but could be a USB or Serial Cable) via a separate NIC. That is the DRBD part of the equation.

The second part is the HA, or heartbeat. When the primary fails, then the secondary starts the services, such as asterisk etc, and moves a floating IP address which usually points at the primary, and points it to the secondary, and everything continues as normal.

The issue is in the failback - e.g. getting the primary working properly again.

Auto failback would not seem to be desirable, because with an intermittent fault on the primary, the two systems would flip-flop.

Manual Failback is not that easy, requires some good degree of technical knowledge and other nasty things can happen, such as split-brain, where the two systems are right out of sync.

So my belief is that this is suitable for larger installations with onsite tech support, or support not far away, to check that the primary has not failed (this is so good you don't notice a failure) and to sort out issues as they occur.

For smaller installs, it may be difficult to manage.

The problem I ran into was reliably failing back when I induced various faults, or worse the secondary promoting itself to primary and not being able to demote it.

But in any case, the reference materials I used are as above, except for the last one, which is fairly new.

Joe
 

unison

New Member
Joined
Mar 19, 2009
Messages
5
Reaction score
0
Would be good to know what Ward is thinking...

with most modern devices you really just need to ensure the config is kept in sync between the two servers - then use dns service records and leave the server selection up to the phone - so it will connect to your primary server, but if that isnt avali it will connect to the backup server...

taking that a little futher.... there could even be an option to use dandi between the two servers, then having both servers active
 

gregpadgett

Member
Joined
Feb 15, 2008
Messages
36
Reaction score
0
Pmft

Ward,

I can't find anything about this on Nerd Vittles, did you ever do a tutorial on it? I am in need of a failover solution.

Thanks!
 

gregpadgett

Member
Joined
Feb 15, 2008
Messages
36
Reaction score
0
I cannot find the PMFT on NerdVittles - did you ever have a chance to post it?
 

gregpadgett

Member
Joined
Feb 15, 2008
Messages
36
Reaction score
0
Did you get this process to work like you wanted? I am very interested in doing this also.
 

wardmundy

Nerd Uno
Joined
Oct 12, 2007
Messages
19,206
Reaction score
5,229
We obviously haven't written this up yet, and we're out of pocket for several more weeks. But let me sketch out the theory, and some of you can take it from there. This is not as elaborate as Joe's approach which requires a secondary network interconnecting the two servers. So here goes...

First, your two servers both need fixed IP addresses. Build the duplicate server as a mirror image of the first one, shut down the first one, bring up #2 and change its IP address. I'm going to use 192.168.0.50 and 192.168.0.51 in this example where 50 is the primary and 51 is the backup server.

Second, you need to set up keys on both servers so that they can log into each other with SSH without using a password. There's a NV article explaining this.

Third, the idea here is that you use IPtables to block outgoing traffic on ports 5060 and 4569 (SIP and IAX) on server #2 while server #1 is functioning. This keeps server #2 from trying to register with your providers and telephone instruments.

HTML:
-A OUTPUT -p udp -m udp -o eth0 --sport 5060 -j DROP
-A OUTPUT -p udp -m udp -o eth0 --sport 4569 -j DROP
-A OUTPUT -p udp -m udp -o eth0 --dport 5060 -j DROP
-A OUTPUT -p udp -m udp -o eth0 --dport 4569 -j DROP

Fourth, we use a cron job on server #2 to SCP a simple file such as /etc/hosts from server #1 every few minutes. Then we test to see if we got the file. This is the Poor Man's heartbeat system. If we got the file, server #1 is working. If we didn't, server #1 is dead.

*/10 * * * * root /root/heartbeat/heartbeat.sh > /dev/null

Fifth, when we detect that server #1 has died, we reconfigure IP tables on server #2 to allow traffic on 5060 and 4569, change the IP address of server #2 to the server #1 IP address, and then reboot server #2 which now functions as server #1.

You obviously need scripts to reconfigure server #2 back to its original state once server #1 is ready to function again. And the trick here is to do this from the server #2 CLI and reboot server #2 back as .51 before bringing up server #1 as .50 again.

If you're using your server to host DHCP, then that obviously needs to be adjusted on the two boxes as well.

I've zipped up the scripts which should be placed in /root/heartbeat, and they're attached...
 

Attachments

  • heartbeat.zip
    5.9 KB · Views: 43

Members online

Forum statistics

Threads
25,825
Messages
167,856
Members
19,250
Latest member
mark-curtis
Get 3CX - Absolutely Free!

Link up your team and customers Phone System Live Chat Video Conferencing

Hosted or Self-managed. Up to 10 users free forever. No credit card. Try risk free.

3CX
A 3CX Account with that email already exists. You will be redirected to the Customer Portal to sign in or reset your password if you've forgotten it.
Top