FOOD FOR THOUGHT When Your Cloud Box Dies

MacNix

Guru
Joined
Jun 21, 2011
Messages
198
Reaction score
30
Location
right here....
So we've got a handful of clients on RentPBX (recommended by this forum), and of course today, the server their accounts are on ALL went down. ALL of them.

not fun.

Anybody have experience in trying to create a backup server FROM SCRATCH or from a RentPBX backup of the server?
 

AndyInNYC

Active Member
Joined
May 23, 2013
Messages
624
Reaction score
84
MacNix,

I've got 3 accounts on RentPBX - no problems here today (or basically ever). If you've missed a bill, they will cut you off after a 3rd notice (we had a billing issue once <g>).

Andrew
 

MacNix

Guru
Joined
Jun 21, 2011
Messages
198
Reaction score
30
Location
right here....
It wasn't a billing issue... They lost a server ("Delilah") in Atlanta... What started as a 1hr maintenance downtime at 11pm on Sun night became a blackout from 22hrs on the 15th till 17:30 the 16th.....

We just "won" on all accounts-had a bunch of clients on that server..

Fortunately, we were able to go to Vitelity (where we keep the numbers managed), point them to cell phones, and the clients at least had SOME service.. but it wasn't pretty...

My bigger question is whether or not it's feasible to somehow copy/backup a RentPBX Piaf server to a file, and quickly duplicate that Piaf server onto another RentPBX (in a different location)??

I"m not particularly pissed AT RentPBX (sh*** happens, and while it's a PITA, it is just one of those things), but I have to have something in place to prevent/mitigate this for the future.
 

AndyInNYC

Active Member
Joined
May 23, 2013
Messages
624
Reaction score
84
I knew there was a reason to stay away from Atlanta <g>.


Andrew
 

kenn10

A lesser geek
Joined
Dec 16, 2007
Messages
1,013
Reaction score
208
I'm not sold on cloud-based PIAF solutions either. I moved my system and the provider has gone down or had network issues. Since I never removed the local server, I can repoint the DNS at the local server in a pinch but wish I didn't have to.....
 

rentpbx

Guru
Joined
Nov 2, 2010
Messages
109
Reaction score
16
One of our DCs in Atlanta experienced a major incident after what should be low impact maintenance. An email explaining the timeline of the occurrence was already sent to affected customers. The outage looks like was because of a network issue due to possibly either human error or bad execution in the maintenance of the DC infrastructure.

Yes, this type of event is not fun. We apologize for the inconvenience.

We are looking at this incident seriously. We will be questioning the DC with hard questions (a lot of them) to determine whether they will be adequate in the future for our need. We will be doing further evaluation on this particular DC. This DC does host a lot of our long time customers for years. This is the only major event we have experienced with them. However, we will be looking into this thoroughly and reevaluating our working relationship with them.

We also would like to point out that this type of event is rare. We do scout DC before we deploy a node. We asked questions about their redundancy and we double check their reputation. We also did it with this particular DC years ago and they did pass all our tests. The reason we mention this is to highlight the fact that regardless of all of our effort to work with a reputable DC, there is always an event or two that can negate all of this work.

In short, downtime happens unfortunately regardless your hosting strategy. It is important to have a backup configuration locally or in a standby PBX. A Good backup strategy will reduce the pain.

MacNix You can always open a ticket to us. We will be happy to throw you some idea on how you can replicate your PBX.

All customers who are impacted will be credited with 1 month worth of service. It is a small token to show our sincere apology. We took full responsibility in this event regardless of where the fault is.
 
  • Like
Reactions: wardmundy

MacNix

Guru
Joined
Jun 21, 2011
Messages
198
Reaction score
30
Location
right here....
thanks for the note. I did pop a ticket (will do again), but haven't heard back yet. My primary concern is replicating/backing it to a file that I can reconfig quickly on a standby..

Again, I'm not at all mad at RentPBX - sh** happens. and this is the first time we've experienced it.

In general, VOIP service seems to be bulletproof (ONCE you get it working).. it's just those occasional 'burps' that really get you....
 
Joined
Nov 14, 2008
Messages
1,401
Reaction score
319
Location
Warwick, NY
I would disagree. VOIP isn't even close to bulletproof. It will fail, the question is when and for how long. There are too many points of failure! Rentpbx and their network may be fine but your customer could experience their own internet failure or routes to Rentpbx from a given area could go out.

I was with HP for 25 years, my organization handled major roll outs and updates all the time."Low impact maintenance" that caused a major outage with no quick rollback plan is pure incompetence!

Many of these providers are flying by the seat of their pants and are too dependent on others. Many don't have their own backup plans! You are the one who always has to plan for the worst.

The traditional phone network is much more reliable and requires very little backup planning because failures are so infrequent.
Since VOIP is dependent on a giant spaghetti network with a million points of failure you need to maintain some connection to the past with traditional pots line(s) and cellular service that can be reconfigured within minutes. POTS for a published, main business number is a good practice. Call forward or publish both the main number and a toll free number. People will try both.

If it's a very small business you should run with local hardware and POTS Line(s) for at least the main number, maybe all.

As it gets larger redundancy becomes very important and it's not cheap. Since every installation will eventually have an outage you stand to lose every customer in the long run unless you safeguard their communication with a well defined plan.

You should have a documented disaster plan. If customers are willing to do a tradeoff on the level of service provided during an unscheduled outage then costs can be reduced.

Their are liability issues to consider too. Acts of God are one thing but poor design or planning on your part, compared to industry standards (like a Cisco or Avaya), can get you in trouble. Example: 911 service should be through a local POTS number if possible.

https://www.google.com/search?q=voip+reliability&ie=utf-8&oe=utf-8
 

wardmundy

Nerd Uno
Joined
Oct 12, 2007
Messages
15,247
Reaction score
2,670
Perhaps this will motivate rentpbx to offer some sort of off-site, low-cost backup image which could quickly be imported to a new server in the event of a catastrophic failure. This forum, for example, now has a hot standby in Canada. It's not real time but it's close enough.
 
  • Like
Reactions: briankelly63

billsimon

Experienced in Asterisk, FreePBX, and SIP
Joined
Jan 2, 2011
Messages
995
Reaction score
330
I think if you're going to go with the cloud model, you have to go all the way. "VPS" hosting is just putting your single point of failure out on the internet where it's harder to recover.

Building on a platform like Amazon with multiple instances, redundant shared database and storage, and so on, is like a VoIP RAID 5. It's still not bullet-proof but you can withstand some points of failure and get them recovered before there's a real problem.
 
  • Like
Reactions: wardmundy

atsak

Guru
Joined
Sep 7, 2009
Messages
1,825
Reaction score
187
I build my pbx platforms primarily using the Hyper V replication, and have a server in Toronto and one in Montreal. Hyper V replication has everything backed up to 5 minutes, and fails over very nicely; couple updates to the IP's (all mine are behind a firewall NAT) and everything's up and running again. Works great.
 
  • Like
Reactions: wardmundy

rentpbx

Guru
Joined
Nov 2, 2010
Messages
109
Reaction score
16
Just to clarify the issue specific to this incident, we did not lose the node. For those who are impacted, the instance of the affected PBX is not even rebooted. None of the PBX was restored from backups.

We appreciate what we can learn and observe from everyone here about redundancy, multiple point of failure in VOIP and many others. We apply a lot of the technology said here in our platform. We appreciate it. We are a big fan or student this information. However, we would like to clarify the issue that we faced. This was not a case of lost/destroyed data. It was not a case of fiber connection was cut. It was not a case of died hardware or router. This is a case of bad maintenance process on a networking equipment. On top of it, there was many bad or incorrect information shared from our DC to us for very long time which caused bad decision made on our part.

We have not connected all the dot yet. However, incompetence was mentioned here. This cannot be fixed with Hardware or Software redundancy, Local or Hosted technology, virtual technology A or B and etc. As we have mentioned before, if we find the DC cannot maintain or ensure us their competency to run secure and stable operation, we will make the best decision on behalf of our client.

Thank you all.
 

wardmundy

Nerd Uno
Joined
Oct 12, 2007
Messages
15,247
Reaction score
2,670
As this thread documents, networking issues can be just as catastrophic as failed hardware. I would encourage everyone to take a careful look at Incredible Backup and Restore. Especially on the CentOS platform, it can provide a really easy and quick way to bring up a new server when disaster strikes. Because it is script-based, it's extensible just by adding directories to the list of backup directories. All you have to do is duplicate your Asterisk and GUI platform on a second site. A service such as CloudAtCost gives you a dirt-cheap way to keep hot standbys with zero recurring costs. Or you can use VirtualBox images in much the same way and also without cost. Just turn the server on if you ever need it and restore from the latest backup. In a couple minutes, you're back in business by simply changing a DNS entry. The performance may not be quite the same, but you won't be out of operation either.
 

Attachments

kenn10

A lesser geek
Joined
Dec 16, 2007
Messages
1,013
Reaction score
208
Living in the Atlanta metro area and understanding the poor quality of the employment pool, I would never select Atlanta for a cloud provider. Yes, it is a communications hub for ILEC and CLEC providers as well as backbone internet services, but there is little effort to strive for excellence amongst many workers. Low wages and "don't give a crap" attitudes abound. My experience and advice is to use due caution if utilizing technology centered in Atlanta.
 
  • Like
Reactions: wardmundy

wardmundy

Nerd Uno
Joined
Oct 12, 2007
Messages
15,247
Reaction score
2,670
In a former life in Atlanta, we had a group of attorneys colocated in the major ILEC/CLEC tower on Peachtree downtown. We connected them back to our main facility using a microwave antenna on the roof (one of dozens up there). Then one day the network and phones died in the remote offices. It turned out the "security guard" in the building had escorted a "repairman" up to the roof to "fix" our antenna. Since we hadn't requested repairs, we were curious to see what improvements had been made. Turned out the complete antenna and all the mounting hardware had vanished.

Not sure it's always "don't give a :001 9898: ." More often than not, it's probably just "dumb as :001 9898: ."
 
  • Like
Reactions: billsimon

TheMole

Guru
Joined
Aug 28, 2008
Messages
96
Reaction score
9
i hate to say this, but:

i find it pretty scary and irresponsible that people are selling commercial (and potentially life saving) services without a good backup plan. as a buyer, that would lead me to steer clear of smaller shops for critical business needs.

and shame on the customers for not doing their proper due diligence in advance of purchase.

(hate-on me if you wish, but this is the truth).
 
  • Like
Reactions: briankelly63

kenn10

A lesser geek
Joined
Dec 16, 2007
Messages
1,013
Reaction score
208
My opinion of the issue is that if its in the Cloud, it (a) can never be really secure and (b) it can never provide 5-nines of reliability. I have a sandbox server in the Cloud that I experiment with but I have had too many "gotchas" to consider moving my hardware based system out there. In my case, on the plus side, the cloud server certainly doesn't suffer from poor internet connectivity like my buggy Comcast service, so the trunks in and out of it sound great. The minus side is that the server in the cloud is overloaded by other virtual servers running along side it and it still gives crappy performance and slow processing of calls. But heck, what do I expect for $35 for life?

I'm sure some of the more expensive cloud instances have extremely high reliability but people want everything cheap or free (since they've grown accustomed to all the freebies from the Federal government.) In the real world, you get what you pay for and reliability is not always cheap.

I'll just keep a local box running, thank you very much. :gunsmilie:
 

voip_user

Member
Joined
Feb 7, 2015
Messages
53
Reaction score
24
Location
Baltimore
My 2cents. I worked at a "hosted/cloud" provider when I first got into VoIP. We offered the hosted/cloud solution. We also had a cisco practice that we provided support for people who wanted to have on prem equipment We had our own fiber MPLS solution to ensure once customers got on our network they would be able to reach the PSTN somehow.

Now with that said how many of our 400+ customers went down each day? Lots!!! Last mile fiber cuts, general Fiber cuts, network outtages for our customers who were outside our network. These are just network issues. I haven't even got into PSTN problems with upstream carriers or power.

Now what about those on prem solutions. They were a tad more stable, but when they had a outage it was pretty bad usually lost of voicemail, or a call manger box, or contact center boxes due to number of thing that could happen. It was almost a pick your own poison.


My point is that there is no way to ensure your system is going to be up 100 percent of the time unless you spend for the planning and resources. If that is a business concern you should be ensuring that you buy the proper systems and have the proper staff to ensure that happens. In the cisco world you can cluster 8/9 servers across the WAN(I forget the actually real number) and the publisher server will always make constant read/writes to the other servers in the cluster to ensure they have up to date info.

In the open source world I've seen a few companies cluster asterisk boxes across the WAN as well, but they have developed there own provisioning, and front end systems/


For us who are using the software available here and other open source products we have to understand the limitations and have good workarounds in place. When using the cloud systems we should always have an idea of how to route the numbers in case of failure, Flowroute and some of the other carriers offer the chance to route numbers to another location when they can't reach the system. That should be mandatory in your design. So even when Rentpbx went down the plan should have already been in place on where to deliver the calls. Also keep in mind rentpbx is charging 20 bucks a month that is dirt cheap for what you get and I go back to the idea above that if you want to this 100 percent up time someone who is charging 20 bucks a month should not be your 1st pick.



Ok end of my rant.
 

rchalk

Member
Joined
Feb 19, 2010
Messages
283
Reaction score
21
Location
N.E. GA, USA
I have been using RentPBX for 3 years, and have only experienced one outage. That was explained to me as a failed power-changeover switch at the Dallas DC, and the same kind of failure could occur anywhere, which is why I believe the wire-line carriers still have battery backup, at least in major centers.

That being said, I have two servers in two different cities with identical setups, and all terminals use a fqdn to point to the server. Changing over involves changing an entry at Dyndns, and rebooting the phones, and redirecting the DID's to the proper subaccount at VOIP.MS

For backup, I run the FreePBX backup every couple of days, and download the backup file onto my local PC, and then upload it to the second server, and from there do a restore. Change the trunk settings for the subaccounts at VOIP.MS to avoid duplication of connections, and the servers match. The only thing at risk is voicemail or changes I might have made to configuration since the last restore.

I know it's not perfect, but it works and is inexpensive, for a customer who has 60 phones in 9 different cities. No opportunity for a local server, because if the internet goes down at that one location, they would lose their entire phone system.

One other thing.. set the VOIP.MS failover to a cell phone at each office.

And one more - use follow-me to route calls to a cell phone if the individual office loses internet. If the server doesn't see the extension, the call will forward immediately.
 
  • Like
Reactions: wardmundy

Members online

PIAF 5 - Powered by 3CX

Forum statistics

Threads
22,456
Messages
138,073
Members
14,620
Latest member
Brads#Bell