QUESTION Poor man PBX auto failover solution.

Are you interested in this solution?

  • Yes

    Votes: 6 66.7%
  • No

    Votes: 3 33.3%

  • Total voters
    9

rentpbx

Guru
Joined
Nov 2, 2010
Messages
109
Reaction score
16
Here is an idea that we have been playing around to provide PBX automatic failover to secondary. If this is an interest to the community, we would be happy to share more details about our implementation. This is a work in progress. We would welcome contribution from all of you.

The demo can be watched better with 1080p setting. Any other setting will cause the text to be hard to read. In the demo, we set up 2 PBXs. The one on the left is in US West coast and the one on the right is in US East coast. The call comes in to primary. If the primary is down, the secondary will take over the call. The primary will take over the trunk. The basic concept is to use SIP SRV DNS capability. You will need SIP client/phone that has SIP SRV DNS capability.

Thanks
RentPBX Team
 

mbellot

Active Member
Joined
Dec 15, 2008
Messages
404
Reaction score
185
Could be useful for SOHO, but I'm pretty happy with the ultra poor man solution provided by VOIP.ms already.

Anything happens to the connection between them and my (home) server results in fail over to my cell.
 

krzykat

Telecom Strategist
Joined
Aug 2, 2008
Messages
3,145
Reaction score
1,235
I like it and would definitely want to look into this. I use RentPBX as a backup to an in-house solution now. When the onsite goes out - I have to scramble to change everything to the backup server.
 

ou812

Guru
Joined
Oct 18, 2007
Messages
479
Reaction score
79
I also think this would be a great feature to have if we can implement this into a PIAF master/slave set-up. I noticed that in the video it shows Freeswitch ?

gary
 

rentpbx

Guru
Joined
Nov 2, 2010
Messages
109
Reaction score
16
Basic Idea.
The idea is to provide some kind of fail-over capability for PIAF PBX. Just to set your expectation, this is not a full blown High Availability solution. We are here to present a concept where you can use SIP SRV dns capability to automatically fail-over to secondary PBX when the primary PBX is not functional.

The trick is to use local dnsmasq server in your PBX to create SIP SRV entries with fail-over setup.

In the primary PBX, you can setup your dnsmaqs with the primary and alternative sip trunk provider SIP gateway as your SIP SRV entries. You will create a DUMMY SIP account in your primary PBX to allow your secondary PBX to register.

In the secondary PBX, you can setup your dnsmaqs with the highest priority point to your primary PBX. You can add your SIP trunk provider SIP gateway as the next in the SIP SRV list.

In the case your primary PBX is down, your secondary PBX will lost registration to the primary PBX. It will register to next in line in the SIP SRV. It will take over the registration to your trunk provider.

piaf.png

Some Detail.
Asterisk pbx (at least up to Asterisk 11) does not have full support for SIP SRV. To overcome this limitation, we use Freswitch local as a proxy to connect to our provider. This is the same Freeswitch setup like the skype addon that we have seen here. Our asterisk server only connect to the local Freeswitch server for its trunk. I am not sure the Asterisk 12 has full support of SIP SRV. If it is, we can get away without using Freeswitch.

One thing that I avoid to mention above is to handle synchronization of FreePBX configuration. Some of you may run phone radio system and etc. You may want to do maintenance on your primary server. You secondary and primary server does not necessary need to have the same content. In this case, the secondary server may not have the same setup.

Many of us intended to use the secondary PBX as stand by which will be able to take over when the Primary PBX is down. In this case, We can propose to use FreePBX backup and restore module. In the current version of FreePBX 2.11, the backup and restore module becomes very powerful. You can schedule when you would like to synchronize your configuration. All this can be done automatically. In this scenario, you will need to setup SIP SRV for your primary and secondary PBX. This is for your compatible SIP phone to make connection to either primary or secondary server. Please note some phone may not fully support SIP SRV.

Work in progress.
In the internet, connection between the primary and secondary PBX will not be not perfect. The route between PBXs maybe interrupted. This is a weakness with a redundant system with two nodes. They would not be able to resolve "split-brain" condition. This is a difficult problem. There are solutions using a third system and develop some sort voting system. Or, One can setup some fencing rule. This can get really complicated. This is an area that we still working on. We would welcome input and suggestion. Currently, we just monitoring secondary and primary PBX. When we detect something that is not make sense, we just manually override them (shutdown the primary). This issue can be annoying. However, based on our observation, it occur reasonably rare and at short interval.

As an idea, to solve the a above problem is to develop script that monitor primary PBX connection to a few known IP address. It would shutdown itself when it detect less than ideal network connection. Any idea is welcome.

Thanks for the interest. I hope that the idea can be one of starting point to help anyone who need a fail-over system.

Br
RentPBX Team
 

atsak

Guru
Joined
Sep 7, 2009
Messages
2,381
Reaction score
436
I think what would be best is some kind of automated, well documented procedure (video above notwithstanding) which would sync the configuration between two servers - IVR, recordings, voicemails for end users, ring groups and so on between two systems. In other words two identical systems with everything the same except the IP.

System 1 goes down, change the IP on system 2 or change DNS on the outside. The SRV records and so on will require a witness or additional networking, and honestly for the frequency systems fail I think you can accept it. What you really need is a 100% swap to a new system in 10 minutes or so. With very, very few exceptions people can live with 10 or 20 minutes of outage every couple years. If they can't I would argue this isn't the right platform for them.
 
Joined
May 23, 2013
Messages
223
Reaction score
28
If you use a DNS provider like DNS Made Easy and use the failover records this can be done very easily. Just setup two PBX and rsync the config between the two depending on how important the data needs to be syncd will determine the rsync interval. If the primary server goes down, failover will change your IP to the backup server and you are back in business. Set a short TTL of say 180 seconds for most people should be enough and you would have less than 3 minutes of downtime if a device just registered before your primary server went down. Then when DNS Made Easy detects your primary is back up, the records are changed back to your primary IP and devices will roll back as the DNS entry expires.
 

rentpbx

Guru
Joined
Nov 2, 2010
Messages
109
Reaction score
16
atsak, if you need to sync your PBX configuration, you can use FreePBX backup/restore module. The latest functionality that we try can sync configuration and data automatically at scheduled interval. It is not a user attended manual backup and restore. If this is what user typically need, it is great. User can remove the trunk registry trickery in the picture. In the case they loss the primary PBX, they can manually enable trunk on the second PBX.

chris, using DNS and Low RTT setup is good way to handle switch over your phone between PBXs. In fact, we play around with it quite a bit on the phone side. However, if your trunk registrations need to failover to second PBX. Would you need to manually enable your trunk? Some trunk provider does not need registration. That would work if they can send call over to backup PBX.

In any case, SIP SRV (in fact DNS SRV record for other protocol) is a standard that is design with failover in mind. It seems that it is rarely used. We would like to put the idea using it on the table. If this is something that has merit, I hope the community can build on it. This concept/idea is to explore the SIP SRV capability and provide the "automatic" process in the failover.

Thanks for the input.
 
Joined
May 23, 2013
Messages
223
Reaction score
28
chris, using DNS and Low RTT setup is good way to handle switch over your phone between PBXs. In fact, we play around with it quite a bit on the phone side. However, if your trunk registrations need to failover to second PBX. Would you need to manually enable your trunk? Some trunk provider does not need registration. That would work if they can send call over to backup PBX.


If you are using a SIP provider that requires registration that would be an issue I agree, but any good (IMOHO) provider does not use registration to your PBX. Though many can support that if you don't have static IP's, it's doubtful again IMOHO that anyone who doesn't have static IP's needs this type of failover. Which is still the problem with DNS SRV records as you need static IP's to support using them. That I know of there is no DDNS client that will update DNS SRV records. So if you have static IP's, a good provider that doesn't require registration, DNS and low TTL failover is the easiest solution for anyone looking for a "poor mans auto failover". Just easier, less to mess up, and less to fail, just my two cents on it. :)
 

ou812

Guru
Joined
Oct 18, 2007
Messages
479
Reaction score
79
We have many clients that use large telco's as there sip provider that do not require registration, but those same systems also have other trunks for cheaper long distance and fail over that require registration, so having the ability to use both reg/nonreg trunks are needed. I do agree with if a customer does not have a static IP they most likely don't need auto fail-over.

gary
 

rentpbx

Guru
Joined
Nov 2, 2010
Messages
109
Reaction score
16
SIP SRV record does not seems to imply support or non support of Dynamic IP. As an example one can setup SRV record look like

_sip._udp.pbxlocal.local. 86400 IN SRV 10 30 5060 asterisklocal.dyndns.com. (backup behind dynamic ip)
_sip._udp.pbxlocal.local. 86400 IN SRV 10 20 5060 asteriskprimary.staticip.com. (primary somewhere with static ip)

We can use some help to verify whether this would work. There is no mandatory specification to say asterisklocal.dyndns.com must be an IP or Static IP address. If this work, this will solve the phone connecting to the PBX side. However, Your trunk will need either registration mechanism. Or, SIP trunk that support SIP SRV. Instead of an ip address to send the call to, you can tell them send it to your SIP SRV. I wonder which SIP provider can provide this.

As phone that support SIP SRV, Here is one of the popular SIP client that support SIP SRV http://www.acrobits.cz/28/acrobits-softphone-for-android. There are many hard phone that claim support SIP SRV. Google search is your friend.
 

billsimon

Well-Known Member
Joined
Jan 2, 2011
Messages
1,534
Reaction score
727
Do I understand from your video that it takes 2 minutes for a failover event to occur?
 

rentpbx

Guru
Joined
Nov 2, 2010
Messages
109
Reaction score
16
The failover typically happen within 5 minutes of faster in real live. It is not in the order of tens of minute. In theory, the SIP SRV specification say something like on every transaction the SIP agent will need to resolve with SIP SRV record. An SIP INVITE is a sip transaction. If your primary PBX is down, your SIP SRV compatible should immediately failover on the next SIP command like REGISTER and INVITE. When the PBX is back on line, the SIP SRV phone should connect to the primary PBX on the next register or next SIP INVITE or any future SIP packet.

You can apply the failover not only for hardware failure. Any type of failure on the a PBX can be setup so that the client to switch over to the secondary PBX. For example, in the case of DDOS, you can write a simple script to monitor in your primary pbx for your network quality and automatically shut it down. This will trigger a failover.
 
Joined
May 23, 2013
Messages
223
Reaction score
28
SIP SRV record does not seems to imply support or non support of Dynamic IP. As an example one can setup SRV record look like

_sip._udp.pbxlocal.local. 86400 IN SRV 10 30 5060 asterisklocal.dyndns.com. (backup behind dynamic ip)
_sip._udp.pbxlocal.local. 86400 IN SRV 10 20 5060 asteriskprimary.staticip.com. (primary somewhere with static ip)


You are correct you can use SRV records in this manner, but I still fail to see why complicate things and use this approach over DNS failover? Two or more servers, two or more IP's and One DNS failover solution does this without anything extra needing done to the server, not requiring apps and phones that support SRV records. Just seems like this is an overly complex idea for a simple solution.
 

rentpbx

Guru
Joined
Nov 2, 2010
Messages
109
Reaction score
16
SIP SRV (DNS SRV record) is design for Failover in general. There is no moving part. The 2 or 3 records are static and readily synchronized. You can set them with high TTL value. This is just like reasonably static table that list all your servers (primary and secondary).

As you mentioned when switching a DNS name to ip, The DNS TTL will come to play. DNS propagation will come in play. If you set your TTL to low value, there will be a lot more hit to your DNS server even the record does not change.

The way I look at it is in the case of failover, something has to switch mapping of a "name" to an "ip". That something can be DNS record. Or, with SIP SRV, the sip agent can make the switch. In the point of view of architecture, would it be cleaner that the switch is taken care of by the agent that need or aware of the need to failover? Your DNS server is not design to be aware about SIP protocol. a SIP agent can find out from a SIP transaction more information what kind of failure and decide to failover or not.
 
Joined
May 23, 2013
Messages
223
Reaction score
28
But now you have configs in both PBX that have to be there and right, now you have the local dnsmasq server and you still don't have failover UNLESS the phone or softphone you are using support SRV records. DNS failover is just must cleaner. As for DNS hits, sure but unless your server is running huge sites, the extra lookup for a low TTL isn't going to break anything. Better yet if you use a service such as DNS Made Easy as I stated earlier not a problem at all. As for propagation, that is nothing at all if you are running your own server, and a couple seconds max on a service like DNS made easy. With a TTL of 180 you have AT MOST 3 minutes for a client/phone to switch to the backup server assuming that right as the server went down it just took a fresh lookup. In the real world it's going to be much less time. I have tax clients that in the busy months of the year we do this and lower the TTL to 60 seconds, when a server goes down for any reason the only person who notices it is the person on the phone on an active call because the phones register so quickly after the DNS lookup. We use provider failover to send the SIP calls to the next server and leaves out any manual config or changing needing to happen. Then when the main server is back up, phones move back over on their own and everything is dandy. Just saying too many issues out there in your idea for SRV records the big one being not universally supported by client software and phones.
 

rentpbx

Guru
Joined
Nov 2, 2010
Messages
109
Reaction score
16
BTW, let me make sure that our intention is not to change or convert anyone Failover strategy. If you have one and it works for you, by all mean don't change.

I would thank chris for presenting alternative failover strategy and provide some critique. This allow us to peel off and discuss more advantage and disadvantage one strategy or another.

In term of technology DNS does not aware of protocol such as SIP. We have run quite a few version of PIAF. Have anyone seen your asterisk crash by segmentation error? Have anyone seen kernel out of memory error on their box? A PBX becomes un available not just because of your box is dead. The TCP/ip stack may still work. However, your SIP level stack does not work. If you try to solve it using DNS technology, it will not work. If your SIP agent make the decision when to switch, they can make better decision when/how to switch. This is what I meant by cleaner technology.

Again, if you have working failover strategy, I fully understand your opinion of it. For those who don't have one. we are just presenting an idea. Who knows it will lead to something better.
 

Hyksos

Guru
Joined
May 28, 2011
Messages
474
Reaction score
70
//just saw RentPBX reply, sorry if I sound like a parrot.

There are valid points to both sides.
Chris, some of the potential problems with your suggestion is that although it works fine and can be an excellent way to do it.
It's completely customized and dependent on a proprietary service offered by dns made easy.
Their monitoring agent is not a SIP agent. and will only failover when detecting the failure scenario that dns made easy coded into their tool(with the customization they allow). a kind of nmap+pinger sorta thing.
If the machine is up and everything is dandy except that asterisk is somehow borked. DNS made easy will not trigger the failover.
Also, although rare, it appear you never faced recursive DNS server that ignore low ttl and cache records for longer than the ttl specifies. This can be a pain if you have remote clients and whatnot since you don't control the DNS server that will be serving your potentially unfailed over records. Rare, but possible in the wild.

All that is easy to work around or simply ignore if you don't face those problems. You're solution is a good way to do that with the help of a pretty cheap dns service.
It won't detect all failure scenario and might require manual intervention to trigger the failover if the dns made easy agent consider it's up while the phone are seemingly not working...

The plus side of DNS SRV is that they are the standard for doing stuff like that. It's why they exist. If using software that support them(big if in some cases, you right, but some are in this scenario) correctly the solution is cleaner and more standard.
That's what RentPBX means by the sip agent ability to detect failure on its on terms. DNS made easy could consider your server up while asterisk is completely hanged for 24 hours.
A proper DNS SRV supporting sip endpoint, will failover, it's not pinging or checking sockets...

So there is potential in both solutions, it depends.

Anyway, let's be clear, VOIP is a pretty hard thing to failover cleanly and automatically...
What if sound is choppy or a bit robotic between my pbx and the provider's POP, or between a remote endpoint(just this one) and the PBX.
You'll get crappy sound all day long unless you have a method to manually failover something to something else.
Either the phone register on another PBX or your pbx needs to start receiving and sending calls to a different POP or different provider entirely.
There is no fully automatic solution to plenty of VOIP failover problems unless you do a lot of specialized coding around asterisk/freeswitch and possibly the endpoint itself...

So I can see why Chris is defending it's setup, it's a good way to do it and it has a somehow automatic feel to it, for some failure scenarios, and it's easy to manually failover if it's doesn't do it.
But in both scenario :) there is plenty of room for problems that won't trigger failover and where even a manual failover, won't do a thing to fix the issue impacting quality.
 

nightstryke

Member
Joined
May 28, 2013
Messages
85
Reaction score
8
As far as choppy robotic sound sometimes that's entirely dependent on the internet connection of the PBX itself, in the extreme IDEAL situation you'd want the VOIP PBX to be setup with it's OWN Dedicated Internet Connection separate from the main network. That way no users could tax the network doing anything they should or shouldn't be doing in a work environment.
 

Members online

Forum statistics

Threads
25,778
Messages
167,504
Members
19,198
Latest member
serhii
Get 3CX - Absolutely Free!

Link up your team and customers Phone System Live Chat Video Conferencing

Hosted or Self-managed. Up to 10 users free forever. No credit card. Try risk free.

3CX
A 3CX Account with that email already exists. You will be redirected to the Customer Portal to sign in or reset your password if you've forgotten it.
Top