Email this article   Print article 


Amazon Breaks Cloud Outage Silence With Apology, Credit

By Andrew R Hickey
April 29, 2011    9:20 AM ET

Page 2 of 2

To prevent similar issues and cloud outages from occurring in the future, Amazon Web Services said it "will audit our change process and increase the automation." Additionally, Amazon said it has now put three separate protections in place to avoid a repeat: It has increase its capacity buffer; it will modify its retry logic in the EBS server nodes to prevent clusters from getting into a re-mirroring storm; and it is testing a fix that will to avoid EBS node failure.

Amazon said it will also invest in increasing its visibility, control and automation to recover volumes in an EBS cluster, which would have saved significant time in the recovery process and would have enabled customers to more easily recover their applications in other Availability Zones in the Region.

Amazon said it also intends to make it easier for users to leverage different and multiple Availability Zones to avoid future issues. Many of the customers affected most in the Amazon cloud outage only leveraged the North Virginia Availability Zone and did not have failover into another zone. Amazon is putting measure in place to make it easier to create multiple Availability Zones and will host a number of free Webinars to offer customers and partners tips and best practices for architecting in the cloud.

Amazon also addressed its lack of communication during the cloud outage, a source of contention for users.

"In addition to the technical insights and improvements that will result from this event, we also identified improvements that need to be made in our customer communications," Amazon said. "We would like our communications to be more frequent and contain more information. We understand that during an outage, customers want to know as many details as possible about what's going on, how long it will take to fix, and what we are doing so that it doesn't happen again."

Amazon said most of the AWS team, including its entire senior leadership team, was directly involved in resolving the outage. Amazon said it felt "focusing our efforts on a solution and not the problem" was the best way to go. Amazon said it updated customers when it had information that was accurate.

"That said, we think we can improve in this area," Amazon said. "We switched to more regular updates part of the way through this event and plan to continue with similar frequency of updates in the future. In addition, we are already working on how we can staff our developer support team more expansively in an event such as this, and organize to provide early and meaningful information, while still avoiding speculation."

Amazon continued: "We also can do a better job of making it easier for customers to tell if their resources have been impacted, and we are developing tools to allow you to see via the APIs if your instances are impaired."



<< Previous | 1 | 2

To continue reading this article, please download the free CRN Tech News app for your iPad or Windows 8 device.
Related: Videos | Slide Shows | Comments

SHARE THIS ARTICLE

More Cloud

Recent Articles

10 Intriguing Product Updates From Google I/O 2013

CRN takes a look at some of the key ways Google intends to influence the way we do business and enjoy our free time. A number of product rollouts and updates were made at I/O 2013. Here are the most intriguing.

8 Tips For Successful Cloud Migrations

Successful cloud migrations don't merely focus on changes in technology; they are also focused on the comfort levels of both people who are familiar with the new technology as well as those who might be slightly apprehensive about the forthcoming changes.

9 Key Concerns That Block Cloud Sales

The benefits of the cloud are heavily touted by cloud providers and the various types of channel partners with which they work. But a number of stumbling blocks still remain. Channel partners outlined for CRN some of the objectives they hear most often.

  More Slide Shows




Related Videos
Loading...