New Amazon Cloud Outage Takes Down Netflix, Foursquare


A brief, but widespread Amazon Web Services (AWS) cloud outage took out a host of popular Web sites late Monday night, including major AWS customers like Netflix, Quora, Reddit and Foursquare.

The outage, and some of the Web sites taken down by it, closely mirrors a similar cloud outage Amazon suffered in April, although Monday's downtime was much shorter in duration.

According to Amazon's AWS Service Health Dashboard, the cloud giant began investigating connectivity issues with its Elastic Compute Cloud (EC2) out of its Northern Virginia data center, its east coast hub. Amazon verified connectivity issues between its US-East-1 region and the Internet 11 minutes later.

Amazon's Relational Database Service (RDS) also suffered connectivity issues between Amazon RDS database instances and the Internet, which Amazon verified at 10:57 p.m. ET.

According to Amazon, EC2 was restored to full connectivity and was operating normally as of 11:03 p.m. Eastern, while Amazon RDS was operating normal with restored service at 11:08 p.m. ET.

While Amazon quickly righted the cloud outage, it wasn't in time to avoid some customer Web sites from going down.

"We're currently being affected by the Amazon EC2 outage," Foursquare wrote on its status site at 10:14 p.m. ET. "We'll update when everything's back up and running. Thanks for your patience."

At 11:06 p.m. ET, Foursquare said it was back and "keeping a close eye on things."

Meanwhile, Reddit tweeted: "At this point, it appears that Amazon's US-EAST data centers have all dropped off the Internet." The company later said the site was back up and running.

And Netflix, one of Amazon Web Services' largest cloud customers, said its streaming service suffered disruption during the downtime.

"We're aware that some members are experiencing issues streaming movies and TV shows. We’re working to resolve the problem," Netflix support wrote on Twitter.

This is the second major outage for Amazon this year. In April, the cloud service suffered a massive outage that took out a host of customer Web sites, some for several days. During the downtime, customers were angered by Amazon's lack of communication. More than a week later, Amazon apologized to customers and offered them a cloud credit.

Amazon said that cloud outage was caused by a network traffic shift that was "executed incorrectly" and instead of routing traffic to the other router on the primary network, traffic was shifted to the lower-capacity redundant Elastic Block Store (EBS) network. Amazon said the issue caused EBS volumes in the Northern Virginia Availability Zone to become "stuck" in a "re-mirroring storm." That made the volumes unavailable and created latency and outages.

Monday night's outage also comes as Amazon works to right its cloud infrastructure in Dublin, Ireland, which was knocked offline over the weekend by a lightning strike that caused an explosion and massive power outage in that area. As of early Tuesday morning Amazon was still working to bring EC2 and RDS in Ireland back to full operation, according to the AWS Service Health Dashboard.

Microsoft's Business Productivity Online Suite (BPOS) was also affected by the Irish lightning strike, but was returned to service within a few hours. Microsoft BPOS had already suffered several cloud outages this year.

The recent rash of cloud outages has renewed concerns over the availability and reliability of cloud computing services. But many experts and solution providers agree that cloud outages aren't reason enough to avoid cloud services.

See the latest cloud technologies, learn best practices, and interact with your peers at the channel’s first all-inclusive cloud event: NexGen Cloud Conference & Expo, December 4-5, 2014 at the San Diego Convention Center. Register now at  www.NexGenCloudCon.com