Downtime In 2017
As a general rule, uptime continues to improve as cloud providers gain proficiency and develop better tools for operating the largest and most-advanced server clusters to ever exist.
For that reason, the very notion of a catastrophic cloud outage seemed almost an anachronism going into 2017. While all providers suffer bouts of downtime that restrict specific services, or short bursts of regional unavailability, massive failures of the kind seen in the industry's early days, many believed, surely had gone the way of the dodo.
But near the end of February, the world was reminded that even the most-experienced operators enabled with the most-advanced automation tooling were vulnerable, and the blast radius of failure was unprecedented.
That Amazon Web Services outage shook the industry, and diminished the confidence of enterprise customers warming to cloud adoption, because of the sheer number of business services that became unavailable that day. GitHub, Slack, Zendesk, Heroku, Twilio, Mailchimp, Citrix and Expedia constitute just a small list of the casualties. Confidence further waned when the cloud leader revealed the cause was human error -- essentially an incorrect one-line command typed in by a technician.
That memorable outage, and to a lesser extent the nine others on the list below, remind a rapidly maturing industry that the stakes of operational excellence are higher than ever.
Get more of CRN's 2017 tech year in review.