Cloud Solution Providers: Amazon Cloud Outage A 'Cautionary Tale'


Amazon's cloud outage, which lasted about four days and crippled a number of Web sites, is a black eye on the emerging cloud computing market. But Amazon's cloud failure is a cautionary tale that shines a spotlight on the need for a solid cloud plan and that can create a host of new opportunities for cloud solution providers.

Amazon Web Services Elastic Compute Cloud (EC2) and Relational Database Service (EDS) suffered service interruptions and downtime starting early Thursday morning last week, knocking many customer Web sites offline or causing poor performance. The Amazon cloud outage persisted through the weekend.

By Monday, Amazon was still seeking the root cause of the cloud outage and interruption and the vast majority of Amazon customers had returned to full service. Amazon traced the issue to its Elastic Block Storage (EBS) in its North Virginia data center, saying that data was getting "stuck," but the Amazon is still probing the true cause of the cloud outage.

Amazon has not responded to requests for additional comment.

And while Amazon's outage casts a dark shadow on cloud computing, cloud solution providers said the incident won't turn the industry backward into on-premise environments, but could drive opportunities their way as companies seek to avoid downtime and the pitfalls of the cloud that Amazon's outage brought to the forefront.

"It will simply make people plan their cloud architecture and design more, and will create more opportunity for services providers who actually know how to design, build and support such environments," said Tony Safoian, CEO of SADA Systems, a North Hollywood cloud solution provider.

Safoian said the outage drew attention to an issue that the industry has known existed, but some have chosen to ignore. "Behind the cloud there are still people, processes, and systems which may fail. We have to plan and design accordingly," he said.

Additionally, cloud trust comes into question as companies determine with which cloud provider to trust their data and where they'll get the most support. "SLAs will be more important, and trust will still be there, but it won't be in the form of blind trust," Safoian said.

Many cloud early adopters have succumbed to blind trust, and have put all faith in their cloud providers. Paul Burns, president of cloud analyst firm Neovise, said Amazon's outage could be an end to that era.

"I hope it helps move customers away from blind trust. If anyone wants to run mission critical applications in the cloud or even in their own datacenters, they need to take some additional steps to ensure availability," Burns said. "It isn't always easy and it isn't always cheap, but customers with mission critical apps need to design in a bit of redundancy.

Burns said an outage like this was bound to rattle the industry, as new technologies often face trials before they become mainstream.

"This outage will be remembered for a long, long time. I think it will have some negative impact in the short term and medium term, but longer term it will probably make cloud computing even stronger," he said. "New technologies tend to go through cycles. Everyone has seen cloud get overhyped for the last couple years. It makes sense for the pendulum to swing the other way a bit. At some point it will probably swing too far to the negative side. There will be more failures for both public and private clouds so I wouldn't be surprised to see a period of negativity."

NEXT: Amazon Outage Highlights Need To Prepare For Failure

Joseph Coyle, CTO for North America at Capgemini, said the outage will force cloud clients to step back and educate themselves on exactly what they're purchasing from their cloud vendors.

"The exposure here is that when leveraging the cloud, the buyer needs to fully understand the technology and the SLAs that each cloud provider offers," Coyle said. "High availability and data center failover are offered at different levels. Clients need to fully understand what they are signing up for, but also what their tolerance is for each system or environment that is being migrated to the cloud."

For cloud solution providers, giving clients that understanding is key and will require a new level of transparency in the cloud.

"There will need to be more transparency on what technologies are running on the back end. Not so much vendor hardware but more around the software -- hypervisors, redundancy, etc -- in place," Coyle said. "I think there should also be more focus on calling out the levels of redundancy that are included in the contract versus additional levels available for purchase. Finally although cloud vendors do not support announcing where their data centers are located, I do believe they now need to at least provide details on local data center redundancies -- in Amazons case known as Availability Zones."

Jim Damoulakis, CTO of GlassHouse Technologies, a Framingham, Mass.-based solution provider, said the Amazon downtime is a reminder that moving applications into the cloud and leveraging cloud services doesn't remove the need to plan and ensure that the cloud meets the needs that it's set out to meet.

"It's kind of a d�j vu all over again scenario," he said, adding that the Amazon outage shares similarities with the push for disaster recovery of a few years back.

Damoulakis said the outage came as a surprise because Amazon has run cloud environments for more than four years with only a few hiccups, which led to an expectation for it to just work. He said that could have made user expectations high, making an outage that much more jarring.

"The fact that they have run it so well and at such a large scale for so long, some can get lulled into a sense of complacency," Damoulakis said, noting that "the cloud environment can be as reliable and as unreliable as you make it."

For companies like Stratalux, a Los Angeles-based solution provider that offers cloud management, Amazon going down in some areas opens to door to more management opportunities. It also brings to the surface that cloud environments are managed differently than on-premise and they shouldn't be treated the same.

"If people are taking their existing IT paradigm and moving it to the cloud, they're going to run into trouble," Stratalux CEO Jeremy Przygode said.

Designing and planning for failure is imperative, Przygode said.

"The cloud will fail, but what you need to do is focus your attention on rebuilding and reprovisioning. You need to plan for that failure. If you design the infrastructure correctly, you shouldn't see any failure," Przygode said.

Michael Kirven, co-founder and principal of New York-based cloud solution provider Bluewolf, agreed and said "just because it's easy to deploy an application to AWS, don't fall victim to planning for failover." Kirven said once applications are in the cloud, they don't exist in a vacuum; there are still management requirements.

"Just because I can move it into the cloud, that doesn't mean I can ignore it," he said. "It still needs to be managed. It still needs to be maintained."

Cloud Outages Are Part Of The Game, Shine Light On Support

Przygode said he sees a potential short term impact in cloud adoption caused by Amazon's fumble, but the industry will right itself.

"I don't think it's going to go backward," he said. "But some people are going to run for the hills."

Bob Shinn, senior managing partner of cloud strategy at Grayslake, Ill.-based cloud consulting firm Cloud Silver Lining, said the outage will have some immediate impact on cloud deployments, but will not lead to abandonment of cloud computing or Amazon for the misstep. However, Shinn said Amazon's outage will enable solution providers to illustrate how best to approach the cloud.

"Unfortunately the outage will impact adoption. We do not see an exodus from cloud or Amazon. We do see some basic changes that would have eliminated business impact," he said. "Design for failure from the start. This includes going across AWS availability zones. This is one small example."

David Hoff, vice president of technology with Atlanta-based cloud solution provider Cloud Sherpas said he's dealt firsthand with a major cloud outage, Google in September 2009. One of the major responses to that outage was that customers became more aware of their terms of service and reviewed the conditions of their cloud services more closely.

"Something like this brings those obligations to the forefront," he said.

Brian Fino, managing partner for New York-based Fino Consulting said consultants will be able to take a bigger role in helping clients select which cloud vendors fit best in their environments. It will also educate users to ask cloud support questions up front.

"They'll want to know what happens. Who do I call if there's an outage?" Fino said.

Kirven added that Amazon's outage raise the issue of cloud support. He said Bluewolf's phones started ringing as soon as the outage was reported, not because Bluewolf clients were impacted, but because they wanted to ask if they could be affected.

"They need to partner with a firm they can get on the phone and talk through issues with," Kirven said, adding that "If you're going to stake your infrastructure on it, you need a throat to choke."

One thing solution providers agreed on was that outages are the name of the game. They're going to happen, but it's being prepared for them that makes a big difference. "Outages, for better or worse, are part of the IT industry," Hoff said.

Hoff said he doesn't see the Amazon outage prompting a cloud mass exodus. "It doesn't drive people away from the cloud, but causes them to question what they're getting for their commitments," he said.

And GlassHouse's Damoulakis said the Amazon outage is not an indication that cloud computing is the wrong approach, it's a wakeup call that what companies need and what they buy should be in sync, which will open a service opportunities for VARs to ensure that their client's cloud needs and cloud deployments are a strong match. "You can't just simply write a check and your problems go away," he said.

Overall, Hoff said, the outage is a reminder that cloud computing is still in its early stages and issues are bound to arise.

"Buyer beware. Know what you're getting into," he said. "There are going to be outages at least for the foreseeable future, because the technology is so immature."

For Bluewolf's Kirven, the Amazon cloud outage is "cautionary tale" but good will come out if it as companies pay closer attention to planning, deploying and supporting cloud environments, where Bluewolf will offer guidance. "The benefit still outweighs the risk," he said.