The 10 Biggest Cloud Outages Of 2018 (So Far)

No Broken Internet

The first half of 2018 didn't see any cloud outages severe enough to be described as breaking the Internet. As can always be expected, however, there were several operational failures that interrupted the business of major enterprises, froze application services providers in their tracks, and annoyed everyday employees and consumers around the world.

Such outages inevitably shake public confidence in the infrastructure that now largely powers the modern enterprise -- from workplace collaboration to logistics to customer engagement.

But given the rapid pace of cloud adoption and data center buildout in recent years, and breakneck innovation that's delivering complex new cloud services, the lack of catastrophic failures in the first half of the year should be taken as a good sign that major providers have mastered the art of uptime.

(For more on the biggest news of 2018, check out "CRN's Tech Midyear In Review.")

Google Cloud

A database glitch affecting Google's application development platform caused headaches for some high-profile Google Cloud customers on Feb. 15.

Problems with Google Cloud Datastore, a NoSQL database designed for scale, started appearing just before noon PT.

Users of Google App Engine, a Platform-as-a-Service that provides access to Cloud Datastore, saw errors and high latency for more than an hour.

Gamers were particularly annoyed, as many popular online games take advantage of those Google services. Pokemon Go and Snapchat were among the applications affected.

Equinix

Data center powerhouse Equinix lost power in its Ashburn, Va., campus during a Nor'easter storm that slammed the area on March 2.

The power outage resulted in a networking failure, interrupting service to customers that co-locate on Equinix infrastructure for two roughly 10-minute intervals that morning.

Confusing the matter, the AWS peering facility's connectivity failures happened around the same time a nearby AWS region (engulfed in the same storm) lost its Direct Connect interlink to nine data centers on the Equinix Ashburn campus, and two CoreSite facilities near Reston, Va.

The dual failures made it unclear who was to blame for a series of AWS service disruptions affecting customers with hybrid infrastructure.

Atlassian, Twilio, and Capital One all struggled with downtime that day.

Amazon Web Services (March)

A widespread Amazon Web Services outage on March 2 silenced Amazon's Alexa personal assistant for many customers, and rendered choppy popular Internet services including Atlassian, Slack, and Twilio.

Amazon later said it experienced a networking error that morning at multiple Virginia-based data centers engulfed in a powerful Nor'easter.

The storm disabled Direct Connect dedicated links from the AWS North Virginia region to two large co-location operators on the East Coast -- Equinix and CoreSite.

iomart

The Scottish IT services provider that bills itself as "The Original Cloud Company" partly blamed a farmer for several hours of downtime on March 29.

The outage resulted from two closely occurring events.

First a hardware failure shut down one leg of the company's network. That would have been survivable if not for the farmer in Yorkshire who cut through another fiber bundle later that day while digging a drainage system.

Both network disruptions in tandem took out connectivity that afternoon between three data centers in Glasgow, Edinburgh and Manchester.

The problems affected private and public-sector customers in the Northern U.K., including those of subsidiary Melbourne IT.

Microsoft Office 365

Microsoft customers in Europe, Asia and the U.S. were locked out of their email accounts on April 6.

The U.K. was hit especially hard by the Office 365 outage, with some businesses not able to send emails or access Skype for much of the day.

Some users reported that they could only log in to the popular office productivity suite using single sign-on.

The failure came a day after Microsoft introduced new security features to protect Office 365 users.

Amazon Web Services (May)

The cloud leader experienced connectivity problems May 31 because of a hardware failure in a data center inside its North Virginia region.

Affected customers were down about 30 minutes, and there were reports that not all data could be restored because of total hardware failures.

The outage "was a result of a power event impacting a small percentage of the physical servers in that data center as well as some of the networking devices,’ AWS said in a post-mortem.

AWS' core EC2 service, as well as its Relational Database Service, Workspaces virtual desktop service, and Redshift data warehousing service were all impacted.

Visa

The credit card issuer experienced a hardware failure in a data center that affected customers in Europe on June 1.

That afternoon's outage prevented transactions using credit card-embedded chips and PINs, but not ATM withdrawals.

Visa quickly confirmed the disruption wasn't the work of hackers. By the next day, Visa cards were working at "close to normal levels."

Microsoft Azure

Storage and networking disruptions prompted by a heat wave kept many Microsoft cloud customers in Europe away from their data for more than five hours between June 17 and 18.

Microsoft said a control system that was supposed to keep the heat down at its Ireland data center malfunctioned during a particularly hot stretch of Irish summer.

The overnight outage affected mostly Northern European businesses using multiple Azure storage and database services.

Slack

Slack, the popular enterprise communications and collaboration application, suffered a service outage the morning of June 27 that lasted for more than three hours.

The company blamed "connectivity issues" and tried to keep users informed of progress through a steady stream of tweets.

The outage demonstrated how dependent some people have become on Slack to facilitate their communications in the office. In a wave of tweets, people complained without the service they had no alternative but to talk directly with co-workers.

Google Home, Chromecast

Owners of Google Home and Google Chromecast struggled on June 27 to engage with the cloud services powering those devices.

Google Home, the AI-enabled personal assistant, and Chromecast, a video streaming stick, stopped working just past 6 a.m. PT.

It took Google roughly eight hours to develop a fix for the cloud services, and six more to roll it out globally.

Once the patch was implemented, users just had to reboot their devices.