The 10 Biggest Cloud Outages Of 2014

These Clouds Washed Out

Benjamin Franklin once said that the only things that are certain in this world are death and taxes. But if the great statesman and inventor lived in our modern world, perhaps server outages might have been on that list as well.

No matter how good the underlying technology, no matter how competent the hosting provider, clouds sometimes fail.

Here's the list that companies don't want to make: CRN's top 10 cloud outages of 2014 in chronological order.

Dropbox, Jan. 10, 2014

The cloud storage company underwent a global outage starting about 8:30 pm EST.

In a postmortem, Dropbox said it was upgrading the OS on some of its machines that store databases used for features like photo album sharing and camera uploads, but not its core business of file storage. A subtle bug in the upgrade script tried to reinstall an OS on an active machine, and the system went haywire.

Dropbox's website returned server error messages, and desktops and mobiles wouldn't file sync.

Most of the service was recovered from backups within three hours, but full core service was not fully restored for two days.

Samsung, April 21

A fire erupted at a data center in Gwacheon, South Korea, and for the next several hours, Samsung smartphones and tablets around the world were painfully separated from their data.

The fourth-floor inferno also caused problems with credit card services, Samsung’s Smart TV and other devices that use Samsung servers.

Experts wondered why so many servers were centralized in one location, and no redundancy with other sites was built into the system.

Internap, May 16

An uninterruptable power supply system failed at Internap’s New York data center after a utility power outage blacked out the region. The cloud service provider went down at 3 in the morning, impacting customers using colocation and IP connectivity services.

Internap remained down for seven hours.

The outage took out streaming video platform Livestream and the StackExchange network of sites that are popular among developers.

Microsoft Lync, June 23; Microsoft Exchange, June 24

Lync, Microsoft's instant messaging and VoIP service, part of the Office 365 suite of cloud-based business products, went down on June 23 in much of North America before noon EST.

For some users, the outage, which Microsoft said was caused by ’external network failures,’ lasted up to eight hours.

The very next day, with the Lync outage fresh in the minds of Office 365 users, Microsoft's hosted email service, Exchange Online, suffered a similar fate, keeping some customers out of their email up to nine hours.

Microsoft said the back-to-back Office 365 failures were unrelated.

Verizon Wireless, June 27

Verizon Wireless suffered a widespread outage that brought down parts of its billing system, preventing customers from being able to access their online accounts, pay bills or, in some cases, upgrade their phones.

The systemwide outage that began early on a Friday lasted about a day, not only impacting customers who use the My Verizon online portal, but also Verizon’s own retail stores.

No-IP.com Seizure Outages, June 30

While probably not as significant an outage when measured on a scale of economic productivity, this one was more infuriating because it was caused by an intentional act.

Microsoft, citing cybercrime perpetrated against its users, seized 23 domains from No-IP.com, a Reno, Nev.-based provider of free dynamic DNS services. In so doing, the software giant also took out service for 1.8 million legitimate No-IP.com customers for more than two days.

Among them was SonicWall, a network security vendor acquired by Dell in 2012, which said hundreds of its customers, including buildings that run security surveillance cameras using No-IP.com's dynamic DNS service to relay video feeds, were offline.

A federal court transferred DNS authority over the domains to Microsoft, which argued they were launching pads for malware attacks.

Microsoft Azure, Aug. 18

The Azure cloud went dark for some users for as long as five hours after a security patch for Windows 8.1, issued as part of a monthly Patch Tuesday release, caused technical problems.

Microsoft reported Azure services, such as Virtual Machines Websites, Automation, Backup, and Site Recovery were down in multiple regions.

Some analysts have complained that the software giant has yet to offer a complete postmortem of what went wrong.

Microsoft Azure, Nov. 18

We don't want to seem like we're beating up on Microsoft here with its fourth appearance on this list (including its role in causing the No-IP outages), but Microsoft's stature in the industry certainly warrants the scrutiny.

And as Gartner Distinguished Analyst Lydia Leong complained after this crash in November, "Microsoft's disastrous inability to keep Azure outages confined to a single region is a major red flag for enterprises considering Azure."

The Nov. 18 outage that affected customers around the world using a variety of Azure services was caused by a glitch in a performance update to its cloud storage service.

Microsoft ultimately determined human error was the culprit.

Amazon Web Services CloudFront DNS server, Nov. 26

Amazon Web Services' CloudFront DNS server went down for two hours, starting at 7:15 p.m. EST. The DNS server was back up just after 9 p.m.

Some websites and cloud services were knocked offline as the content delivery network failed to fulfill DNS requests during the outage. Nothing major, but worthy of this list because it involved the world's biggest and longest-running cloud.

Xen Vulnerability Reboots: AWS, Rackspace, IBM SoftLayer; late November

While not exactly a cloud outage, several major public clouds were forced to perform emergency reboots near the end of November, disrupting operations for many of their customers.

The reboots were instigated by a security vulnerability discovered in the popular Xen open-source hypervisor.

AWS, Rackspace and SoftLayer all gave customers little warning -- in some cases just hours -- that they would be temporarily taken offline and then need to relaunch their cloud services.

An advisory was released to the general public after all the cloud providers had completed installing patches.