10 Biggest Cloud Outages Of 2010 (So Far)11:00 AM EST Tue. Jul. 06, 2010
It doesn't happen often, but when the cloud goes down it sure is a big deal. Whether it's a Web outage that restricts access to data or a server that goes kaput rendering services useless, cloud outages can hurt. And as we start to rely more and more on the Web to conduct business, outages can be even more severe. So far this year, there have been some high-profile cloud outages that raised the ire of users and prompted questions around the trustworthiness of the Web for housing applications and essential data. Here we take a look at 10 big ones that rocked the cloud world.
Intuit's online services, including TurboTax Online, QuickBooks Online, Quicken and QuickBase, suffered a massive outage that began Tuesday, June 15, leaving thousands of SMB customers unable to process credit card payments and other transactions.
The overnight outage conked out Intuit's cloud-based tax and accounting services and was reportedly caused by a power failure that occurred during routine maintenance. The outage affected Intuit's primary and backup systems and Intuit's Web sites and services used by as many as 300,000 small and midsize businesses.
Twitter suffered intermittent outages through much of June, spates of downtime that could continue through July.
The cloud-based micro-blogging monster blames the outages on events such as World Cup 2010, which is prompting a massive increase in activity that the site just can't handle.
Trouble started on June 11 when Twitter suffered poor site performance and a host of errors due to high capacity. Issues persisted on Monday, June 14 with several hours of ups and downs. Twitter has said periodic outages could continue through the beginning of July, but that it is making internal network adjustments in hopes to avoid future problems.
A connectivity loss and outage at Hosting.com's New Jersey data center took down Hosting.com's cloud for just under two hours on June 1. During the outage, which lasted from 6:45 p.m. to 8:29 p.m. eastern time, access to systems in the Newark data center were severely degraded and intermittently unavailable, impacting business services and connectivity for customers using Hosting.com's cloud services, according to an alert from Apparent Networks, which measures the performance of leading cloud providers in its Cloud Performance Center. According to Hosting.com’s operations Twitter feed, the company acknowledged the connectivity loss was due to Cisco 6509 switch failures in both a primary and backup switches. The cause of the failure was traced to a software bug in the switches.
Terremark's vCloud Express services suffered an outage after a bout of connectivity loss in its Miami data center on March 17. The outage resulted in intermittent periods of connectivity with high data packet loss starting at 11:54 a.m. eastern and lasting more than seven hours, ending at 7:05 p.m. eastern time. According to Apparent Networks' Cloud Performance Center, during the outage access to systems in Terremark's Miami data center was severely degraded and often unavailable, affecting many businesses using Terremark's vCloud Express services.
On April 26, NetSuite suffered a service outage that rendered its cloud-based applications inaccessible to customers worldwide. According to NetSuite, the cloud apps were down for 30 minutes and some customers experienced sluggish performance long afterward. A network issue was to blame for the downtime. NetSuite has not divulged the exact cause of the outage or how many customers overall were affected.
A pair of network outages knocked The Planet off line for about 90 minutes on May 2 and caused disruptions for another 90 minutes into the following morning. The outages affected the operations of a number of customers hosted in The Planet's Houston and Dallas data centers. The network outage was determined to have been caused by a fault in a router in one of the company's two Houston data centers. Then, the next morning, a separate, unrelated outage rocked The Planet, causing disruptions for customers in Houston and Dallas. A circuit between Dallas and Houston was found to be the cause of that disruption.
Sage North America was slapped by instability and outages to its online systems in June, which included a 22-hour outage that rendered the company's Website, email, order entry-system and online CRM out of commission, according to a report on TheProgressiveAccountant.com. Sage told the site that the issue was isolated to Sage's storage area network and that the company's third-party storage provider was working with technical teams to resolve the problem. Throughout the outage, the Website of Sage's parent, Sage Group, was also out and some Sage resellers reported that the only North American Sage site they could access was for its X-3 ERP manufacturing package. Sage's Web-based CRM system, AccpacOnline, was also affected in the June 1 and June 2 outage.
Microsoft's free Windows Live services, including its Hotmail Internet email offering and Windows Live Messenger, were offline on February 16 due to a server failure. According to Microsoft, there was an issue with the Windows Live ID service and log-ins failed for some customers, which increased the load on remaining servers. Things were back to normal after about an hour, Microsoft said.
On Feb. 17, EMC's Atmos Online was unavailable for an unknown amount of time. Atmos Online is the cloud-based storage offering that is part of EMC's Atmos Infrastructure. Users attempting to log onto the Web-based EMC Atmos Online service were met with this greeting: "EMC Atmos Online Temporarily Down For Maintenance: The Website is currently unavailable and will be back up shortly. We apologize for any inconvenience and thank you for your patience."
In a statement, EMC said the Atmos Online outage was caused by maintenance issues, but did not elaborate.
On January 28, Microsoft Online Services users in North America were met with intermittent access to services, including Microsoft Business Productivity Online Standard Suite (BPOS). According to Microsoft, some users served by a North American data center were affected. Here's what went down, according to a blog post from Microsoft: Monitoring alerted Microsoft to a possible issue; troubleshooting found there was a problem with network infrastructure resulting in intermittent access for customers.
In response to the incident, Microsoft said it found the root case and took the steps necessary to remediate the issue. Additionally, Microsoft reached out to affected business customers and offered them a credit if they were impacted. Microsoft did not say how long the intermittent access lasted or exactly how many customers were affected.