Microsoft Explains What Went Wrong In Latest Global Azure Outage
The Microsoft Azure cloud outage that affected customers around the world yesterday was caused by a glitch in a performance update to its cloud storage service, the software giant said Wednesday.
While Microsoft had extensively tested the update for several weeks before applying it, an issue surfaced after the update was applied to the Azure Storage service, Jason Zander, corporate vice president of the Microsoft Azure team, said in a blog post Wednesday.
"During the rollout we discovered an issue that resulted in storage blob front ends going into an infinite loop, which had gone undetected during [testing the update]," Zander said in the blog post. "The net result was an inability for the front ends to take on further traffic, which in turn caused other services built on top to experience issues."
Microsoft started rolling back the update once it discovered the glitch, but that required restarting the Azure Storage front ends, which caused some users to be unable to access the service, said Zander.
The outage, which began for some customers in the U.S., Europe and Asia at around 7 p.m. Eastern time Tuesday, are now mostly resolved, although "a limited subset of customers are still experiencing intermittent issues," Zander said in the blog post.
Zander said Microsoft will provide a "root cause analysis" of the outage once it has had time to figure out the details of what went wrong. One issue Zander didn't address is why Azure outages often happen across multiple global regions.
Lydia Leong, a vice president and distinguished analyst at Gartner, pointed this out in a tweet Tuesday evening, noting that "Microsoft's disastrous inability to keep Azure outages confined to a single region is a major red flag for enterprises considering Azure."
Leong also noted that Microsoft still hasn't offered an explanation of the issues that caused its last major Azure outage in August.
Cloud outages are a fact of life at this stage of the game, and Microsoft certainly isn't the only provider that has been hit with unforeseen glitches.
But with Amazon Web Services boasting a growing number of enterprises that are moving their whole data centers to AWS, Microsoft will need to reassure customers that it's able to react quickly -- and more importantly, communicate -- when things go wrong with Azure in the future, Microsoft partners said.
PUBLISHED NOV. 19, 2014