Solution providers say the Amazon Web Services (AWS) S3 outage earlier this week points to a need for more aggressive independent cloud design and architecture.
"Cloud does not replace the need for good strong consulting and vision on how to actually architect for continuous delivery of business applications," said Jamie Shepard, senior vice president for health care and strategy at Lumenate, No. 152 on the 2016 CRN SP500. "When we architect systems we know that if there is a failover, there is enough full capacity to handle that workload. That is how we design infrastructure. That is why our hybrid cloud design business is accelerating. Relying on cloud vendors as architects is a mistake. You cannot rely on cloud vendors as architects or to be advocates for the customer."
[Related: AWS Apologizes For Cloud Outage, Blames Typo]
Lumenate does not resell AWS cloud services, but it does leverage AWS and other public clouds, including Google and Azure, as part of complete software-defined hybrid cloud platforms, said Shepard.
For Shepard, one of the most telling takeaways from the four-hour outage on Tuesday was that AWS' own customer-facing Service Health Dashboard was not available during the crisis. "That's bad architecture," he said. "They basically leveraged one region and did not factor any redundancy into that system. AWS had to tweet the outage. That is how customers found out. That is unacceptable. No dashboard to look at. No one to call. You mean to tell me [if I'm an AWS customer whose] business is down and my users are asking me what is happening and I don't have a dashboard to tell me what is going on? "
AWS, for its part, Thursday apologized for the outage -- which was sparked by an AWS team member entering a bad command during the debugging of an S3 billing system -- and disclosed "several changes" including a move to run the Service Health Delivery dashboard across multiple AWS regions. "We understand the SHD provides important visibility to our customers during operational events and we have changed the SHD administration console to run across multiple AWS regions," the company said.
CRN reached out to AWS for further comment but did not receive a response at press time.
It is also telling, Shepard said, that AWS parent Amazon.com – the world's largest online retailer – remained up and running because it was architected to be geo-redundant while many other major retail websites took a hit. "The question is: did those customers know they did not have a geo-redundant system?" asked Shepard. "A lot of our customers right now are in a battle internally where people are telling them to leverage more public cloud. This is another proof point in the case for hybrid IT. It demonstrates the need for strong data center architects that understand the customers' business case and the challenges and pitfalls of public cloud enablement."
Apica, a website testing, optimization, and monitoring provider, said that 54 of the top 100 internet retailers were affected by the outage, including three sites that went down completely – Express, Lulu Lemon and One King's Lane. Appica said S3 is Amazon's largest services and is used by more than half of its one million plus customers with more 3-4 trillion pieces of data in it.
Douglas Grosfield, the founder, and CEO of Five Nines IT Solutions, a Kitchener, Ontario-based strategic service provider that provides high-level cloud consulting centered on public and private clouds, said far too many customers are blindly ceding their data to public cloud providers.
"The fox is watching the hen house when you have any major cloud provider like AWS acting as your sole source of advice, design and integration of your IT environment," said Grosfield. "I don't want to say the inmates are running the asylum, but that is clearly not best practice for cloud design, architecture and delivery. A basic tenet of ours is: you don't design a cloud architecture with a single point of failure."