What is High Availability (HA)?
In the simplest of terms, high availability is the reduction of downtime in favor of overall uptime. You want your infrastructure—whether that’s web apps, an eCommerce store, end-user systems, inter-company communications, or archiving, among a lot of other things—to be up and running (without interruption) as much as possible. A more complex definition of high availability involves various systems that comprise an IT infrastructure (hardware, software, and personnel) that are put in place to maximize uptime by minimizing, mitigating, and recovering from failures. This includes physical redundancies, virtual redundancies, data replication, automatic failovers, load balancing, disaster prevention, disaster recovery, data security, and an IT department (or a third-party host) with the expertise to implement and run it all.
Why Is It Important?
Any time you have downtime, it’s going to be a bad time. Whether it’s an interruption in the workflow, an impediment to your data, a degradation of the end-user experience, or a total stoppage of productivity, downtime is going to eat into your enterprise’s mission. Seriously.
How Do You Measure It?
Have you heard of the 9s? It’s a relatively common way of measuring availability. Simply put, the more 9s you have—from 90% uptime (one 9) to 99.9999999% (nine 9s)—the less downtime you’re experiencing. Obviously.
“The gold standard is five 9s, or 99.999% uptime, which adds up to 5.26 minutes of downtime a year, 25.9 seconds of downtime per month, and 6.05 seconds of downtime a week.”
While five 9s are incredibly efficient, a more reasonable goal is three 9s, or 99.9% uptime, which adds up to 8.76 hours of downtime per year, 43.8 minutes of downtime per month or 10.1 minutes of downtime per week—this falls right in line with the amount of downtime 81% of business said they could tolerate, according to Information Technology and Intelligence Corp.
How High Availability Will Save You Time and Money
Ok, so, you’re on board for high availability (HA). It’s a no-brainer, right? I mean, we’ve covered the topic more than once—here, here, and here. So, you know that less downtime equals more money. It sounds great in theory, but there’s more to it. You have to ultimately consider the price of high availability against the revenue safeguarded from its implementation. The good news is component parts in a high availability system are steadily declining, making the investment in high availability increasingly worth it. But, if you are hosting your website or database on your own servers and they are your only point of contact with your clients, downtime will be disastrous. In this case, you absolutely need to invest in HA. It’s imperative that you understand your needs, your tolerances, your pain points, and your IT infrastructure. Once you do, you’ll be able to make smarter (more economical) decisions about the implementation of HA.
What High Availability Options are Available
Even the simplest of infrastructures have a lot of working parts and points of failure. The goal of an HA infrastructure is to eliminate single points of failure and create subsystems, routines, and procedures that reduce downtime in the inevitable (no system is ever 100% available) event of a failure. To that end, there are a vast array of components (hardware, software, and human) that make up a high availability system.
“These include physical redundancies, virtual redundancies, automatic data replication, automatic failover systems, load balancing, disaster prevention, disaster recovery, data security, and personnel.”
Physical & Virtual Redundancies
Sure, everyone knows you need redundant hardware, software, and storage (servers, firewalls, switches, routers, etc.), but what about redundant power? Without the proper power backups—including separate paths for separate feeds; battery backups; uninterruptible power supplies (UPS); and in some cases, generators—you’re creating a single point of failure.
You may even consider creating redundant systems in separate geographical locations in the event of a disaster of some kind. This is especially true for natural disaster-prone regions such as the Atlantic Coast, the Gulf Coast, and Hawaii (hurricanes); the Great Plains and the Midwest (tornados); and the Pacific Coast (earthquakes, mudslides, and fires). Having redundant physical locations hosting your redundant systems might just keep your enterprise from experiencing significant downtime when mother nature knocks down your door.
Automatic Data Replication
At the very heart of high availability is data replication—the same data stored on multiple devices. An HA structure with data replication will write to multiple instances in real time and will protect databases, websites, cPanel configurations, etc., ensuring consistency between each device in the infrastructure. A typical configuration will have two, identical primary volumes that are backed up by two more physical volumes, which will be backed up by two more virtual volumes. Each volume will (again, typically) be Distributed Replicated Block Device (DR:BD) volumes that will perform selective, synchronous data replication. DR:BD volumes rewrite and backup blocks of changed data, as opposed to rewriting and backing up entire volumes. This saves a tremendous amount of time and money.
An automatic failover is a system procedure that monitors the health of your infrastructure and will, in the event of a system failure (network, hardware, software, power, etc.), perform an automated role reversal of the primary and secondary node. Any time a piece of equipment stops operating—or even begins to perform below its expected values—a failover will be triggered. An automatic failover system is ultimately where the rubber meets the road. Once you’ve set up your redundancies, crossed your T’s, dotted your I’s, and prepared for the worst, the only thing left to do is wait. If you’ve done everything right and failure does occur, your automatic failover procedure will kick in, and there will be a seamless, no-downtime transition to a redundant system.
A load balancer is a device (hardware, software, or a combination of the two) that evenly distributes website traffic, stabilizing performance and preventing crashes. A load balancer can also act as an automatic failover device, making it an essential component in a high availability system. Typically, load balancing works by way of an algorithm that distributes users between servers.
“There are 9 common algorithms/methods—the round robin method; the least connections method; weighted least connections; source IP hash; URL hash; the least response time method; the bandwidth and packets method; custom load; and least pending requests (LPR).”
This is not a complete list. It’s possible that you could work with your IT or hosting service to come up with another method that best suits your enterprise.
Way back in 2003, IT Pro Today (specifically Kalen Delaney) took an interesting—and somewhat unique—approach to considering disaster prevention: Delaney wrote, “…while I was planning for this article and trying to determine which activities constitute disaster prevention and which constitute disaster recovery, I found that the line between the two isn’t a neat one. I also realized that to distinguish between disaster prevention and disaster recovery, you need a clear definition of “disaster” for your organization.” Despite the fact that Y2K was nipping at the heels of this article, the reasoning is still sound today. As such, this blog (since it’s all about HA) is considering a disaster anything that creates downtime.
“Downtime, after all, is to high availability as Thanos is to the Avengers.”
This means anything that helps facilitate uptime is a key component to disaster recovery in an HA infrastructure. This includes everything covered up to this point—redundancies, data replication, failover systems, and load balancing.
Disaster Recovery is not just about technology and data, but people too. A very real, very human plan needs to be put in place in the event of a catastrophic failure. So, what’s your plan? While it has its own set of problems, the very biggest companies run redundant, parallel, geographically separate data centers so that if one goes down the other can come up with little to no interruption in service. However, that’s an ideal situation and hardly a sound strategy for the majority of businesses. What’s reasonable is having a conception of what your business can shoulder in terms of downtime amongst your various IT systems, understanding the various scenarios that could compromise those systems, and having a plan in place to reinitialize those systems in a reasonable time frame.
“Your IT infrastructure is comprised of 5 points of vulnerability—housing (server/computer room, climate control, and electrical supply), hardware (networks, servers, computers, and peripherals), ISP (fiber, cable, wireless), software (productivity, communication, website, etc.), and data.
You, as a company, have to decide what you are going to do in case one of these five pain points goes down. Do you have another space for your equipment if the first space is compromised? What about your servers? Do you have unused backups or have a plan in place to order emergency equipment? Who is your ISP? Do they have a plan in place in case your service goes out? Disaster recovery is about working with everyone (personnel, vendors, technicians, etc.) to formulate a plan of action, one that makes priorities crystal clear—if your website is down, you want to prioritize getting it back up, not restoring your tertiary power supply. That can wait. Oh, and you want to make sure this disaster plan is organized, disseminated, and stored in multiple locations; some of which should be offsite.
HA can be completely compromised by external (sometimes internal) attacks on your infrastructure—malware, viruses, DDoS attacks, etc. The best, and maybe most obvious, safeguard against these kinds of breaches is a well designed, well managed, redundant high availability system that can be switched over quickly. However, there are preventive security measures that can be put in place to defend against attack; anti-DDoS routers, for example. Working with your vendors, personnel, ISP, and engineers to make sure your data is adequately encrypted and your systems are protected can go a long way toward maintaining HA.
What’s High Availability Options are Right for Your Enterprise
What’s right for your company ultimately comes down to the cost of more 9s versus the losses taken as a result of downtime. If getting another 9 is going to cost you $90,000 annually, but will only prevent $30,000 is losses annually, the cost is probably not worth the savings—it’s better to simply write off the loss.
You also have to consider whether or not your workflow actually has any systems that truly impact your productivity. For example, if your workflow is entirely third-party (Facebook, G Suite, Slack, etc.) you have no need for high availability implementation because it’s their responsibility. Or, maybe you use your own email server for inter-office communication, but everyone is also on Slack. If the email server goes down it will be inconvenient, but not crippling—everyone in the office will just have to talk to each other on Slack until the email server is fixed. All of this gets particularly complicated if you’re designing, buying, and implementing your own infrastructure. With complex tax codes, unanticipated hardware and systems failures, and your changing needs, pinning down high availability expense versus high availability revenue benefits becomes a moving target.
What’s right for you is a company question, one that is going to require staff-wide input and consideration—technicians, engineers, management, accounting, and other personnel are all going to have to weigh in if you want a complete picture of your operation’s needs and tolerances. However, using a vendor to manage your infrastructure—a vendor that specializes in high availability systems—takes a lot of the guesswork out of deciding whether or not HA systems implementations are worth the investment. While your operating costs may increase, a managed hosting provider that has its eye toward high availability will ultimately lower your capital expenses while offloading much of the legwork it takes to keep an HA infrastructure at peak performance. A quality managed host will have a number of options that will weigh your company’s unique needs against cost, will perform consistent maintenance and upgrades on all hardware and software, and will ensure your company against revenue losses as a result of downtime.
“For small to midsize companies that need to make high availability a priority, a managed host is an excellent option.”
Take Action, Win the Day
In today’s always-on, 24/7/365 economy you can’t afford downtime. Your company can’t afford it. You need those 9s. So, while high availability infrastructure may sound complex—with all that talk about pain points, tolerances, 9s, algorithms, 5515-X firewalls, and Y2K, it’s bound to drive one simple ethos: create a system that stays on, and if it can’t, make sure there’s a backup to take over.
The best way to make sure you’ve got an almost-always-on system and an infrastructure with plenty of redundancies is to know your configuration. Know where your power, connectivity, and data is coming from and know where it’s going. The better picture you have of your infrastructure, the better equipped you are to make sure it’s operating optimally, to make sure it’s HA.