The 10 Biggest Cloud Outages Of 2018

This year's largest cloud outages were dominated by the three titans in the market: Amazon Web Services, Microsoft Azure and Google Cloud Platform.

ARTICLE TITLE HERE

2018: The Year In Downtime

See the latest entry: The 10 Biggest Cloud Outages of 2022 (So Far)

Outages, no matter their cause or scope, shake enterprise confidence in the public cloud.

While no cloud is perfect, and to some degree downtime is inevitable, the larger providers leading the industry should be held to higher standards. That's why our year-end list almost exclusively features events that embarrassed the three dominant hyper-scalers: Amazon Web Services, Microsoft Azure and Google Cloud Platform.

id
unit-1659132512259
type
Sponsored post

While those public cloud giants all saw multiple service disruptions in 2018, some more problematic than others, none of them lasted for excessively prolonged stretches or were so widespread they affected large percentages of customers.

Cloud services will continue to see disruptions, but catastrophic failures seem to be a relic of the industry's early, formative years as providers master the art of uptime.

Google Cloud, Feb. 15

A database glitch affecting Google's application development platform caused headaches for some high-profile Google Cloud customers on Feb. 15.

Problems with Google Cloud Datastore, a NoSQL database designed for scale, started appearing just before noon PT.

Users of Google App Engine, a Platform-as-a-Service that provides access to Cloud Datastore, saw errors and high latency for more than an hour.

Gamers were particularly annoyed, as many popular online games take advantage of those Google services. Pokemon Go and Snapchat were among the applications affected.

Amazon Web Services, March 2

A widespread Amazon Web Services outage on March 2 silenced Amazon's Alexa personal assistant for many customers, and rendered choppy popular Internet services including Atlassian, Slack, and Twilio.

Amazon later said it experienced a networking error that morning at multiple Virginia-based data centers engulfed in a powerful nor'easter.

The storm disabled Direct Connect dedicated links from the AWS North Virginia region to two large colocation operators on the East Coast—Equinix and CoreSite.

Microsoft Office 365, April 6

Microsoft customers in Europe, Asia and the U.S. were locked out of their email accounts on April 6.

The U.K. was hit especially hard by the Office 365 outage, with some businesses not able to send emails or access Skype for much of the day.

Some users reported that they could only log in to the popular office productivity suite using Single Sign-On.

The failure came a day after Microsoft introduced new security features to protect Office 365 users.

Amazon Web Services, May 31

The cloud leader experienced connectivity problems May 31 because of a hardware failure in a data center inside its North Virginia region.

Affected customers were down about 30 minutes, and there were reports that not all data could be restored because of total hardware failures.

The outage "was a result of a power event impacting a small percentage of the physical servers in that datacenter as well as some of the networking devices,” AWS said in a post-mortem.

AWS' core EC2 service, as well as its Relational Database Service, Workspaces virtual desktop service, and Redshift data warehousing service were all impacted.

Microsoft Azure, June 17

Storage and networking disruptions prompted by a heat wave kept many Microsoft cloud customers in Europe separated from their data for more than five hours between June 17 and 18.

Microsoft said a control system that was supposed to keep the heat down at its Ireland data center malfunctioned during a particularly hot stretch of Irish summer.

The overnight outage affected mostly Northern European businesses using multiple Azure storage and database services.

Google, July 17

Google Cloud suffered an outage that slowed down or stopped several popular services, including Spotify and Snapchat, on the afternoon of July 17.

Google's cloud status dashboard reported the provider became aware of a networking issue impacting its load balancers just after noon PT that day. Disruption first impacted the development platform App Engine, Cloud Networking and Stackdriver, a service that provides performance and diagnostics data to public cloud users.

Google later posted an update informing users that the 502 errors resulting from the global load balancer issues was resolved as of 1:05 p.m.

Amazon Prime Day, July 16

Amazon has created the biggest shopping day of the year.

But widespread glitches paralyzed sales July 16 just minutes after the fourth annual Amazon Prime Day shopping extravaganza kicked off. The promotion for premium Amazon.com members drives more sales than Black Friday

Those problems were not related to Amazon Web Services, an AWS spokesperson told CRN.

But the failure was a bad look for the world's largest e-commerce site—hosted on the world's leading cloud. Instead of deals, many eager shoppers encountered pictures of dogs accompanied by captions notifying them of the outage.

While shopper frustrations continued throughout the day, and the outage effectively shaved six hours off the 36-hour event, Prime Day sales still managed to break records.

Microsoft, Sept. 5

Microsoft found itself scrambling on two fronts the first week of September to address problems delivering its cloud software services.

The first issue saw users around the world unable to access their Office 365 Outlook or Skype for Business messages for part of Sept. 5. Users reported that when they tried to log into Microsoft they got an error message that said "throttled."

Microsoft blamed the outage on a botched update to Azure's back-end authentication systems.

Meanwhile, on Sept. 4 and Sept. 5, Microsoft grappled with a server and network system shutdown at its data center in San Antonio following a lightning strike. The shutdown interrupted Azure and Office 365 services for customers in Microsoft's South Central U.S. cloud region.

Facebook, Nov. 12 and Nov. 20

November was a bad month for the social networking giant as two outages aggravated users, including some customers of its enterprise collaboration product.

Facebook, including the Workplace collaboration tool, suffered an outage on the afternoon of Nov. 12 that racked up thousands of complaints on DownDetector.com before services were restored.

Within the short period, the hashtag #FacebookDOWN became a top trending topic on Twitter.

Just more than a week later, on Nov. 20, came another outage, the third major downtime incident for Facebook since August.

Three quarters of users who reported issues on the site that day said they were experiencing either a total blackout or login problems from 8 am. ET until early afternoon.

Microsoft, Nov. 18

Microsoft disclosed that on Nov. 18, some users were unable to access Azure and Office 365 services.

The outage affected customers required to sign into those cloud services using multifactor authentication, according to the industry's second largest cloud provider.

The outage spanned the Europe, Asia-Pacific and Americas regions, impacting both Azure and Office 365 services from 11:39 p.m. ET that Sunday.