TUTORIAL PIAF Redundancy Solution

imcdona · Feb 11, 2009

Hello,

I have been working on a few scripts to create a redundant pair of PIAF servers.

Here is how it works:

1. MySQL data is stored on a separate MySQL cluster

2. A cron job runs every 30 mins that rsync's the pertinent directories. In my example, it is also copying cepstral and swift. The script also modifies amportal.conf to point to the MySQL cluster.

As it stand now, I can bring up a fresh copy of PIAF, run the scripts and have the box up and running as a mirror of the primary in a matter of seconds.

You'll notice that I have a crude hack in place to remove the sip registrations from the backup server. The solution of course is to have my DID provider point to a SIP SRV record so DNS takes care of the issue. Currently VoicePulse only allows registrations.

It seems to work ok fine so far. I obviously need to implement error checking on the scripts as well.

If anyone is interested please have a look at:

http://blog.voicebyip.com

What are your thoughts? Is there a better way to go about this? Can you see any potential problems with my solution?

Thanks,

Isaac

jroper · Feb 11, 2009

Hi

That's nice.

MySQL cluster server is a bit of overkill, as you mentioned, but the your mention of replication sounds intersting, maybe coupled with a php script to check that the MySQL server on box A is still alive.

The other way I had considered would be to get FreePBX to send do a mysqldump everytime that a reload was exectued, and pass it to a remote serer, because this only changes when the box's configuration changes.

From the mysql tables, you know what modules are loaded, so from a fresh install, you could download the correct modules from Freepbx.org and restore the box to its previous state.

The main difference between this, and the standard freepbx backup is that you are only shifting a relatively small amount of data, which can be passed over the bradband without affecting voice quailty everytime you do a config change, and you always get the latest backup. and stored remotely, I think the asterisk database when zipped up is only a few Kb.

The remaining stuff on the server can be approached in a different way.

This only leaves voicemails, which if you are emailing them out, don't need to be worried about, and CDR, which many companies can do without in an emergency.

Having said all this, from a commercial stand point, wanddering into a customer with 2 PBX's under your arm is not likely to engender confidence - panansonic would not bring in 2 boxes. I think a system of rapid restore and or cloning is a more appropriate strategy for the commercial world.

Joe

imcdona · Feb 11, 2009

Joe,

I appreciate your feedback. You bring up some good alternatives.

I think you are mistaken though in regards to having just one tricked out box as opposed to two boxes.

In the past if you walked into an office with two solid state telephony devices (not servers) I am sure it would raise a few eyebrows. The fact of the matter is, telephony is not about hardware anymore. Gone are the days of a line card for this, a card to record announcements on etc. The industry is changing.

I had the opportunity to manage an Avaya S8710 PAIR. And you know what? It was connected to a G710 with line cards that provided such "advanced" functions as the ability to record an announcement (A VAL board).

I am not kidding when I say I was confused. I didn't get the logic of it all. I remember asking my Avaya sales rep if he certain there was no other way to record a sound file aside from buying a VAL board. The rep told me he was sure. I STILL didn't believe him. I was seriously convinced he was trying to pull a fast one on me to pad his commission.

It wasn't until I thoroughly researched it that I realized that yes, you actually have to buy a card that stores about 20 minutes worth of sound files. Bear in mind, the S8710 series systems are running CM on Linux. I was shocked that these two servers had gigs of free space and I had to purchase a "card" to store a sound file?

When my boss tasked me with deploying an international telephony solution utilizing our Avaya system I called my sales rep and got a quote. He sent hardware list a mile long and a price tag that I could afford to live off of for year or two. I called him back and told him to send me an Avaya SIP server.

I installed the SIP server and had all DID's globally pointed to the SIP server. I then deployed naked IP phones at each office and called it a day. I saved on hardware, administrative overhead and not to mention the nightmare of managing phone bills from various providers throughout the world.

Avaya is not all bad though. Theymade an announcment a while back that they are changing their focus from hardware based company to a software based company.

You tell me what is more ridiculous, walking into an office with 2 rack mount 1u servers to support 30+ offices globaly or unloading a semi worth of proprietary Avaya hardware that not only has to be installed and maintained but also purchased at outraguous prices. I personally perfer the former.

darmock · Feb 11, 2009

In fact I know what Joe means when you go in to the smb market with 2 servers under your arm.... alarms go off. The mindset is very different between a small client and a medium to large client. To solve this on the larger SMB I generally use a dual system 2U rackmount for redundancy and a proprietary hot swap solution that I developed for my own use. (Sorry it isn't going to become part of PIAF) This costs a bit more but the client only ever sees the 1 "computer" and seems to be happier in the long run.... A bit of jiggery pokery I know but it helps with sales.

In medium to large business I have found they won't even look at your response to their rfp if there is not a redundancy system in the quote.

Tom

wardmundy · Feb 11, 2009

You've raised some great points. We've had redundant PIAF servers for over a year now, but I've been reluctant to roll this out. However, now that you can buy two of the little guys below for under $700 TOTAL, I'm reconsidering.

The way ours works is to copy a one byte file from one device to the other every few minutes. We call it PMFT™ (poor man's fault tolerance). If it doesn't get the file, it assumes the server is dead. It then resets its own IP address to the master address, unblocks the outbound ports in its firewall, and reboots.

All of the SIP phones come right back up when the new server takes over. We, of course, plan to get rich off this design :lol:

but I'll publish it on Nerd Vittles in a few weeks just to be sure the donations come pouring in. :lol:

Photo below is ACTUAL SIZE:

The Deacon · Feb 11, 2009

The company I work for does quite a bit of MySQL replication; consequently, I have been working on a how-to article to do MySQL replication for PIAF. if anyone is interested, I can post it later today.

wardmundy · Feb 11, 2009

By all means, please post. Thanks! Maybe you'll get rich, too. :wink5:

jroper · Feb 11, 2009

Hi

You tell me what is more ridiculous, walking into an office with 2 rack mount 1u servers to support 30+ offices globaly or unloading a semi worth of proprietary Avaya hardware that not only has to be installed and maintained but also purchased at outraguous prices. I personally perfer the former.

30+ offices - yes make that at least 2 servers if not 3. My mindset was more focused on the smaller installations.

Joe

imcdona · Feb 11, 2009

Deacon please post! With any luck, between Joe, Warmundy myself and you, we might actually come up with a solution worth including in the next PIAF release. That would be sweet!

The Deacon · Feb 12, 2009

MySQL Replication

There are several ways to implement replication under MySQL (Master -> Slave, Master -> Master, Master -> Relay -> Slave) in addition to just replicating a single, several or all databases between the hosts.

This how-to will walk you through the process of setting up MySQL database replication in a Master -> Slave environment that will replicate ALL databases (and their respective entries) from the Master to the Slave.

In an upcoming how-to, I'll walk you through getting everything back to normal after a server crash, but this how-to only focuses on getting MySQL replication working between the Master and Slave servers. Additionally, while this how-to is written specifically for MySQL 5.0.x running on CentOS 5.2 on a PBX In A Flash (PIAF) server, you could probably make this work on a server that isn't running PIAF. But there are no guarantees. With most things technical, there is no warranty expressed or implied. Use at your own risk. You have been warned. :biggrin5:

This document makes several assumptions, so let me get those out of the way first. To start, it is assumed that you already have two (2) functional PIAF boxes configured, running and communicating with each other (whether they are on the same subnet, or via hamachi - it makes no difference, so long as they can see each other). It also assumes that you know how to log into the MySQL client, use ssh as well as use some sort of editor (joe/vim/vi). If you don't have all of those prerequisites, stop right here until you have all the parts. I'll wait. :smile5:

Ok, since we have 2 servers, you need to decide which server is the main (or Master) server and which server will be the backup (or Slave) server. The Master server will keep a log of each and every transaction (add, delete, modify) that happens in that database. The Slave will look at the log on the Master server and whenever any changes happen on the Master, it will also make those changes happen on the Slave.

Since the Slave connects to the Master using a standard MySQL username/password, there must be an account on the Master server that the Slave can use to connect with. Any account can be used for replication, just be aware that the username/password used for replication will be stored in plain text within either the my.cnf or master.info file(s). Personally, I find it's easier (not to mention cleaner and safer) to create an account specifically for replication; that way if the account is compromised, the account only has the privilege of performing replication.

Given that, we're going to create a user (called "repl" with a password of "passw0rd") and grant the privileges required for replication, using the GRANT statement. Please feel free to change these to whatever you want.

Log into your MySQL client as the root user and issue the following commands:

Code:

GRANT REPLICATION SLAVE ON *.* to 'repl'@'%' IDENTIFIED BY 'passw0rd';
FLUSH PRIVILEGES;
QUIT

Now that the "repl" user has been created, let's configure the Master MySQL server to create/keep the log to store the changes that Slave server will "feed" from.

Edit the /etc/my.cnf file and in the [mysqld] section add these two lines:

Code:

log-bin=mysql-bin
server-id=1

The first line is the "base name" of the log file that MySQL will use (mysql-bin.000001, mysql-bin.000002, etc)
The second line assigns the Master server an ID (used by replication) to distinguish it from the Slave server.

Now we need to dump the databases that we have on the Master server to import onto the Slave server. We dump the databases by doing:

Code:

cd /tmp
mysqldump -u root -p --all-databases --lock-all-tables > dbdump.sql

We now need to scp the dump file to the Slave server:

Code:

cd /tmp
scp dbdump.sql [email protected]:/tmp

Once you've got the mysql dump is done, let's restart the mysqld service on the Master server:

Code:

/etc/init.d/mysqld restart

SSH into the Slave server and import the database with this command:

Code:

mysql -u root -p < /tmp/dbdump.sql

On the Slave server, edit the /etc/my.cnf file and in the [mysqld] section add these four lines:

Code:

master-host=111.222.333.444 (replace this with the IP address of the Master server)
master-user=repl
master-password=passw0rd
server-id=2

Once you're done, let's restart the mysqld service on the Slave server:

/etc/init.d/mysqld restart

Log into your MySQL client on the Slave server and issue the following commands:

Code:

SLAVE START;
SHOW SLAVE STATUS\G

You should see something similar to this:

Code:

*************************** 1. row ***************************
             Slave_IO_State: Waiting for master to send event
                Master_Host: 111.222.333.444
                Master_User: repl
                Master_Port: 3306
              Connect_Retry: 60
            Master_Log_File: mysql-bin.000001
        Read_Master_Log_Pos: 98
             Relay_Log_File: pbx-relay-bin.000010
              Relay_Log_Pos: 235
      Relay_Master_Log_File: mysql-bin.000001
           Slave_IO_Running: Yes
          Slave_SQL_Running: Yes
            Replicate_Do_DB:
        Replicate_Ignore_DB:
         Replicate_Do_Table:
     Replicate_Ignore_Table:
    Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
                 Last_Errno: 0
                 Last_Error:
               Skip_Counter: 0
        Exec_Master_Log_Pos: 98
            Relay_Log_Space: 235
            Until_Condition: None
             Until_Log_File:
              Until_Log_Pos: 0
         Master_SSL_Allowed: No
         Master_SSL_CA_File:
         Master_SSL_CA_Path:
            Master_SSL_Cert:
          Master_SSL_Cipher:
             Master_SSL_Key:
      Seconds_Behind_Master: 0
1 row in set (0.00 sec)

There are three fields that we are concerned about: the Slave_IO_Running, Slave_SQL_Running and the Read_Master_Log_Pos fields.

The Slave_IO_Running service is the service that maintains communications, as well as moving the log file entries between the Master and Slave. If the Slave_IO_Running service isn't running (or says "No"), the Slave can't see/communicate with the Master.

The Slave_SQL_Running service is the actual service that takes the log file entries that have been copied over to the Slave and makes the changes to the MySQL database. If this service isn't running, check the /var/log/mysel.err log for more information.

The Read_Master_Log_Pos field is really nothing more than the position in the file that the Slave is current with. To see if the slave has caught up with the Master, do the following:

Log into your MySQL client on the Master server and issue the following commands:

Code:

SHOW MASTER STATUS;

You should see something very similar to this:

Code:

+------------------+----------+--------------+------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+------------------+----------+--------------+------------------+
| mysql-bin.000001 |       98 |              |                  |
+------------------+----------+--------------+------------------+
1 row in set (0.00 sec)

The Read_Master_Log_Pos on the Slave server should match the Position field on the Master server. If they differ, it means that the Slave has some catching up to do. Repeat the steps again in a minute or two; if you have a LARGE amount of data, it could take up to 5 minutes (or more) to make the Slave in sync with the Master.

Contratulations, you have just set up database replication between the Master and the Slave.

To really see the replication in action, log into your MySQL client on the Master server and issue the following commands:

Code:

CREATE DATABASE DELETE_ME1;
SHOW DATABASES;

Now, log into your MySQL client on the Slave server and issue the following commands:

Code:

SHOW DATABASES;

You should see the database called DELETE_ME1 listed as one of the databases on the Slave server.

Back in the MySQL client on the Master server issue the following command:

Code:

DROP DATABASE DELETE_ME1;

Back in the MySQL client on the Slave server issue the following command:

Code:

SHOW DATABASES;

The DELETE_ME1 database should be gone.

I hope this tutorial was helpful. If you have questions, please post them below and I will do my best to answer them.

-Rick

Orgasmatron 5.x / Incredible PBX UPDATE: Chances are, you will will need to add these two lines to your /etc/sysconfig/iptables file on both the master and the slave (this will allow the two boxes to see each other via the MySQL replication port):

Code:

# Allow connections to our MySQL server
-A INPUT -p tcp -m tcp --dport 3306 -j ACCEPT

Once those lines are added, you'll need to restart iptables:

Code:

/etc/init.d/iptables restart

wardmundy · Feb 12, 2009

Niiiice.

gomonn · Feb 12, 2009

Thank You so much for sharing guys .... That is the type of thing I have been waiting for (don't know much about programming ...)

imcdona · Feb 13, 2009

Thanks for everyone who took some time to provide feedback.

I'm going to make a few changes to my process based on all the feedback. I'll post the new and improved redundancy process shortly.

unison · Apr 1, 2009

Any more developments on this front

We are currently looking at how to implement redundant PIAF servers

jroper · Apr 2, 2009

I've been looking at a different appoach to replicating databases, which I'm guessing is the same as Ward's

See these links:-

http://www.linux-ha.org/DRBD
http://www.drbd.org/
http://support.red-fone.com/downloads/elastix/Elastix_HA_Cluster.pdf
http://www.voip-info.org/wiki/view/TrixBox+High+Availability+cluster+using+drbd

This is an intersting approach, as the failover is seamless, taking place is in a few seconds, and one person I spoke to about the solution said that the call is not even dropped, just a 3 second silence.

The essence is that a disk partition is constantly replicated over an ethernet cable (but could be a USB or Serial Cable) via a separate NIC. That is the DRBD part of the equation.

The second part is the HA, or heartbeat. When the primary fails, then the secondary starts the services, such as asterisk etc, and moves a floating IP address which usually points at the primary, and points it to the secondary, and everything continues as normal.

The issue is in the failback - e.g. getting the primary working properly again.

Auto failback would not seem to be desirable, because with an intermittent fault on the primary, the two systems would flip-flop.

Manual Failback is not that easy, requires some good degree of technical knowledge and other nasty things can happen, such as split-brain, where the two systems are right out of sync.

So my belief is that this is suitable for larger installations with onsite tech support, or support not far away, to check that the primary has not failed (this is so good you don't notice a failure) and to sort out issues as they occur.

For smaller installs, it may be difficult to manage.

The problem I ran into was reliably failing back when I induced various faults, or worse the secondary promoting itself to primary and not being able to demote it.

But in any case, the reference materials I used are as above, except for the last one, which is fairly new.

Joe

unison · Apr 4, 2009

Would be good to know what Ward is thinking...

with most modern devices you really just need to ensure the config is kept in sync between the two servers - then use dns service records and leave the server selection up to the phone - so it will connect to your primary server, but if that isnt avali it will connect to the backup server...

taking that a little futher.... there could even be an option to use dandi between the two servers, then having both servers active

gregpadgett · May 25, 2009

Pmft

Ward,

I can't find anything about this on Nerd Vittles, did you ever do a tutorial on it? I am in need of a failover solution.

Thanks!

gregpadgett · May 27, 2009

I cannot find the PMFT on NerdVittles - did you ever have a chance to post it?

gregpadgett · Jun 5, 2009

Did you get this process to work like you wanted? I am very interested in doing this also.

wardmundy · Jun 6, 2009

We obviously haven't written this up yet, and we're out of pocket for several more weeks. But let me sketch out the theory, and some of you can take it from there. This is not as elaborate as Joe's approach which requires a secondary network interconnecting the two servers. So here goes...

First, your two servers both need fixed IP addresses. Build the duplicate server as a mirror image of the first one, shut down the first one, bring up #2 and change its IP address. I'm going to use 192.168.0.50 and 192.168.0.51 in this example where 50 is the primary and 51 is the backup server.

Second, you need to set up keys on both servers so that they can log into each other with SSH without using a password. There's a NV article explaining this.

Third, the idea here is that you use IPtables to block outgoing traffic on ports 5060 and 4569 (SIP and IAX) on server #2 while server #1 is functioning. This keeps server #2 from trying to register with your providers and telephone instruments.

HTML:

-A OUTPUT -p udp -m udp -o eth0 --sport 5060 -j DROP
-A OUTPUT -p udp -m udp -o eth0 --sport 4569 -j DROP
-A OUTPUT -p udp -m udp -o eth0 --dport 5060 -j DROP
-A OUTPUT -p udp -m udp -o eth0 --dport 4569 -j DROP

Fourth, we use a cron job on server #2 to SCP a simple file such as /etc/hosts from server #1 every few minutes. Then we test to see if we got the file. This is the Poor Man's heartbeat system. If we got the file, server #1 is working. If we didn't, server #1 is dead.

*/10 * * * * root /root/heartbeat/heartbeat.sh > /dev/null

Fifth, when we detect that server #1 has died, we reconfigure IP tables on server #2 to allow traffic on 5060 and 4569, change the IP address of server #2 to the server #1 IP address, and then reboot server #2 which now functions as server #1.

You obviously need scripts to reconfigure server #2 back to its original state once server #1 is ready to function again. And the trick here is to do this from the server #2 CLI and reboot server #2 back as .51 before bringing up server #1 as .50 again.

If you're using your server to host DHCP, then that obviously needs to be adjusted on the two boxes as well.

I've zipped up the scripts which should be placed in /root/heartbeat, and they're attached...

TUTORIAL PIAF Redundancy Solution

Guru

Guru

Guru

PIAF Developer

Nerd Uno

Guru

Nerd Uno

Guru

Guru

Guru

Nerd Uno

Member

Guru

New Member

Guru

New Member

Member

Member

Member

Nerd Uno

Attachments

Forum statistics