Azure Nightmare: Customer Suffers 9-Day Intermittent Outage, Gets No Help From Microsoft

A Microsoft Azure customer who experienced a nine-day service disruption last month is furious over the software giant's lack of clear and transparent communication about the issue.

The customer, who uses the Azure Virtual Machines Infrastructure-as-a-Service to run core business apps, told CRN this week that he encountered major problems with two of his VMs running in Microsoft's Iowa data center starting April 7.

One of the VMs was down continuously for two straight days, and was then available only intermittently for seven days. The other VM was available intermittently for the nine-day period, according to the customer.

[Related: Network Glitch Takes Down Microsoft's Azure IaaS, PaaS Services For Central U.S. Customers]

Sponsored post

The customer said he informed Microsoft Azure support when the problems began April 7 and was told there was no issue with his VMs. Subsequent interactions with Microsoft support staff yielded the same response, he said.

"Microsoft would not say anything about the outage, in every case flat-out denying they had an issue," said the customer via email, speaking on condition of anonymity. The customer said Microsoft has not offered him a service credit or other compensation for the Azure outage.

The customer told CRN he relies on the two Azure VMs to run software for project management, accounting and payroll processing. One of the VMs runs a Radius authentication server for hundreds of firewalls used by his customers.

When Microsoft finally acknowledged the customer's issue nearly three weeks after it began, it did so in an emailed incident report, a copy of which was viewed by CRN.

"From April 7 to April 16, a small subset of customers using Virtual Machines in one Azure Compute stamp in the Central U.S. region may have experienced unexpected restarts of their virtual machines," Microsoft said in the incident report.

"This was caused by a combination of two software bugs exposed by a recently released update of the network management service when a rare set of circumstances occurred in that Azure Compute stamp," according to the incident report.

However, Microsoft's official statement on the matter is that the customer's issues stemmed from a planned reboot of the virtual machines in question.

"This customer may have experienced a limited number of temporary and intermittent reboots of virtual machines. While not a service incident, we did notify customers directly about the reboots," the Microsoft spokesman said.

"We continue to encourage customers to contact their service representative if their service doesn’t function normally immediately after a reboot," the Microsoft spokesman told CRN.

The Microsoft spokesman declined to comment on why the company did not notify the customer until nearly three weeks after the Azure Virtual Machines problems began.

The customer's experience differs significantly from Microsoft's official account, however.

First, the customer told CRN he did not receive advance notification from Microsoft that his VMs would be rebooted. And after the troubles began and he reported them to Microsoft, the customer was told that there were no issues with the service.

What's more, the customer said, his Azure VMs became inaccessible for hours after each reboot. When they came back online, the VMs behaved as though they'd been rebooted in an unscheduled fashion, he said.

Meanwhile, Microsoft still has not released a public notification for this issue on its Azure status Web page. A Microsoft spokesman told CRN the company does provide public notifications for Azure service incidents, but didn't respond to a follow-up question about whether it does so for all incidents.

"In addition, we also notify customers directly via their Azure portal and provide them with a Root Cause Analysis," the Microsoft spokesman said via email.

While this particular issue may have been limited in scope, reports of Azure outages in general appear to have seen a noticeable uptick as of late. On Twitter, IT administrators and companies running businesses on Azure have been calling on the software giant to be more transparent when outages occur.

I would be happy for to learn ONE lesson: Transparency over downtime/outages is a good thing.

/**/ /**/

Damnation! This azure dns outage is unbelievable. 1995 called and wants its network problems back!

/**/ /**/

we suffered a massive day-long outage 3 most ago on . Our prod systems are down again today. MS is losing iTrend.

/**/ /**/

As for the Azure customer, the downtime and lack of communication from Microsoft have prompted him to look at moving his VMs to Amazon Web Services.

"If [Microsoft] were to have another failure in the next six months, we’d have to [switch to AWS]," the customer told CRN. "Our clients don’t have any tolerance for downtime, and they don’t care if our systems are in Microsoft's data centers or in my garage, as long as they're never affected."