Microsoft Under Microscope Over Azure Outage

Microsoft's Windows Azure suffered its first major downtime over the weekend as users were unable to access the development platform for cloud-based applications for about 22 hours.

Although Azure isn't even a beta yet -- Microsoft offers it to testers as a Community Technology Preview -- the software giant's response to the outage is being closely watched, both by Microsoft partners who are staking future business on Azure and competitors eager for any shred of evidence that shows Microsoft isn't ready for the rigors of cloud computing.

On Friday night, Azure users began reporting that applications had become unreachable and were entering "stopped" or "initializing" states for extended periods. Steve Marx, a member of Microsoft's Azure team, acknowledged the outage in a post to the Windows Azure forums on MSDN and said Microsoft was working on a fix.

Around 3 p.m. Saturday, Marx said Microsoft had identified a recovery process and was applying it throughout the cloud, but noted that the fix would require five hours to take effect. Azure was back online by 8:30 p.m. Saturday, but some partners feel Microsoft should have done more than communicate the issue through an online forum posting.

Sponsored post

Michele Leroux Bustamante, a Microsoft MVP for Connected Systems and chief architect with IDesign, a Los Gatos, Calif.-based software consulting firm, says while it isn't fair to measure the Azure CTP against or Amazon, a production service that goes down for 22 hours without reasonable notification would constitute a major outage.

"I think they made a mistake by not notifying the community in advance, which seems like a fairly easy thing to do and does not inspire confidence," Leroux Bustamante said.

Microsoft certainly isn't alone in suffering cloud service outages -- and Amazon have each had their fair share -- but in an industry where "five-nines" (99.999 percent uptime) means your service can only be down for five minutes per year, every Azure outage will no doubt be met with intense scrutiny from customers and howls of criticism from Microsoft bashers.

Microsoft hasn't yet delivered its SLA or pricing strategy for Azure, but outages are part of the reality of cloud computing today, and more can be expected, notes Tim Huckaby, CEO of InterKnowlogy, a Microsoft Gold partner in Carlsbad, Calif. "Microsoft should really be more proactive [about] these outages. Posting updates on the MSDN forums won't be enough for when Azure is in production," he said.

NEXT: Microsoft's Azure Gamble ...

Microsoft is "doing the community a great service" by allowing access to Azure during the CTP phase, but the fact is that some people are going to judge Microsoft when Azure outages occur, Leroux Bustamante said.

"This is a new space for Microsoft -- hosting applications and services -- and in order to inspire trust, they must behave now as if they are in production. Otherwise, they run the risk of people forming negative opinions based on the CTP," Leroux Bustamante said, adding that this could impact the number of early adopters on the Azure platform.

Microsoft has said it'll be ready to bring Azure to market by the time its Professional Developer Conference rolls around in November. Executives have revealed only that Azure pricing will be "very competitive" with existing cloud-based application development offerings and will employ a consumption-based business model.

When Azure is ready for production, Microsoft would be wise to implement an "opt-in" notification mechanism to inform users of maintenance downtime and other problems, according to Huckaby. "They should be as honest as possible in the notification -- if software pulls down the entire system and you are paying for it, you deserve to know the truth," he said.

Another policy that Microsoft needs to adopt with Azure is to offer a detailed "post-mortem" on each outage that includes as much information as possible on its cause and what steps Microsoft will take to avoid a recurrence, Huckaby said.

That's exactly what one Azure tester asked for in a post to the MSDN forum about the outage. Marx responded that the Azure team would conduct a root cause analysis and issue a summary, but probably not until after Microsoft's MIX 09 conference, which is being held from March 18 to 20 in Las Vegas.

But that apparently wasn't soon enough for Roger Jennings, an independent .Net developer and principal consultant of OakLeaf Systems, Oakland, Calif., who in a subsequent forum post urged Marx to explain the Azure outage sooner rather than later. His words accurately convey the expectations Microsoft will need to meet when Azure hits production environments.

"If you don't want to spend all your time at MIX 09 answering questions about the outage, I'd suggest posting the explanation before Wednesday," Jennings wrote.