The 10 Biggest Cloud Outages of 2016

Cloud Outages: Less Common, More Damaging

The frequency of outages, and their durations, are decreasing as cloud providers gain operational experience and more mature technologies.

That's the good news. But the flip side is that enterprises, and the population at large, are increasingly vulnerable from downtime. As modern applications and data sources become more and more distributed, and our reliance on them to manage nearly every aspect of our lives grows, the potential for harm, or at least intense frustration, is greater than ever.

A high-profile outage near the end of the year, the DDoS attack on DNS provider Dyn, revealed serious vulnerabilities that can paralyze an increasingly connected world.

Verizon: January 14

A power outage at a Verizon data center impacted JetBlue Airways operations on Jan. 14, delaying flights and sending many passengers scrambling to rebook.

Verizon did not say which data center suffered the outage.

New York-based JetBlue wrote in a blog post that the airline experienced network issues because of a Verizon data center power outage that impacted customer support systems, including jetblue.com, mobile apps, a toll-free phone number, as well as check-in, airport counter and gate systems.

Microsoft Office 365: January 18 and February 22

Some Office 365 users were painfully separated from their cloud-based email accounts for many days, starting on Jan. 18.

Microsoft blamed a buggy software updated, but its first attempt at a fix didn’t hold -- another salvo of email failures riled customers five days after the initial outage was reported. Problems with the cloud productivity suite’s email services persisted for longer than a week in some cases. While not all Office 365 users suffered downtime, those affected had a large number of users, Microsoft confirmed.

About a month later, some Microsoft customers in Europe had a rough time accessing their email from their mobile phones, or had to endure delays as they tried to log in to Office 365 services through the web portal.

Microsoft said both outages occurred because infrastructure components became degraded due to heavy resource demand from users.

Salesforce: March 3

Some Salesforce customers in Europe had to cope with a CRM disruption for up to 10 hours caused by a storage problem.

Even after the storage tier was reconnected, some features still weren’t working properly, and the cloud software giant continued reporting degraded performance on its EU2 instance.

Google Cloud Platform: April 11

An outage took down Google Cloud Platform services for 18 minutes on the evening of April 11, affecting Compute Engine instances and VPN service in all its regions.

Google offered affected customers service credits for 10 percent of their monthly Google Compute Engine charges, and 25 percent of their monthly VPN charges.

Salesforce: May 10

A persistent Salesforce.com outage wiped out four hours of data customers entered into their CRMs on May 10 and took days to fully remediate.

While CEO Marc Benioff personally apologized to one customer on Twitter, Salesforce wouldn't comment on how widespread the outage was, or what regions or services were affected by the database failure linked to NA14 – one of 45 Salesforce cloud instances in North America.

Salesforce's system status web page said the performance degradation started at 8:41 a.m. Eastern, followed by a "service disruption" less than an hour later, at 9:31 a.m.

Apple: June 2

Apple's cloud experienced a widespread outage on June 2, taking offline some of the tech giant's popular retail and backup services.

The outage started around 12:30 p.m. Pacific, making it impossible for some customers to access multiple iCloud and App Store services.

App Store, Apple TV App Store and Mac App Store, iTunes and Apple's cloud-based photo service all experienced disruptions.

Amazon Web Services: June 4

As storms pummeled Sydney, Australia on June 4, an Amazon Web Services region in the area lost power, and a number of EC2 instances and EBS volumes hosting critical workloads for name-brand companies subsequently failed.

Websites and online services went down across the Australian AWS availability zone for roughly 10 hours that weekend, disrupting everything from banking services to pizza deliveries.

Affected enterprise customers pointed their fingers at the world’s largest cloud provider as it worked to restore service.

Google Nest: August 22

During a heatwave in several parts of the country, Google's Nest thermostats experienced connectivity errors that left many customers unable to remotely control their air conditioning systems.

While the AC could still be controlled manually, the widespread outage called attention to potentially troubling vulnerabilities of smart-home technologies. Nest also sells Dropcams for home and child monitoring and smoke detectors.

Microsoft Azure: September 15

Multiple Microsoft Azure services, including SQL Database, were degraded on September 15 during a global DNS outage that impacted users in all regions.

Microsoft reported the problem on its Azure status page at 9 a.m. Eastern, with a message noting that engineers had identified a possible underlying cause of the problem and were determining mitigation options. By 11 a.m., the provider reported most downed services had been brought back online.

A week earlier, on Sept. 9, European Azure customers endured a multi-hour outage.

Dyn: October 21

On Oct. 21, Internet performance management company Dyn was hit by a cyber attack, prompting widespread outages that affected some cloud providers, including Amazon Web Services, which had to reroute traffic to alternate DNS providers.

Dyn, based in Manchester, N.H., said its server infrastructure was the target of a distributed denial-of-service attack (DDoS) that specifically impacted Managed Domain Name Servers (DNS) customers. The attack was unique in that it originated from millions of Internet of Things devices – like connected cameras and printers – that had been taken over by malicious software.

A slew of popular websites that rely on Dyn's traffic management and optimization services were either down or experiencing issues, including Twitter, Spotify and Github.