Microsoft Sorry For E-Mail-Killing BPOS Cloud Outages


Microsoft apologized for a string of cloud outages last week that affected its Business Productivity Online Services (BPOS) cloud computing suite, which caused massive delays with BPOS users' e-mails.

"I'd like to apologize to you, our customers and partners, for the obvious inconveniences these issues caused," Microsoft Corporate Vice President of Microsoft Online Services David Thompson said in a blog post detailing the BPOS cloud outages. "We know that e-mail is a critical part of your business communication, and my team and I fully recognize our responsibility as your partner and service provider."

Trouble started around 12:30 p.m. Eastern on Tuesday, May 10, when the BPOS-S Exchange service experienced an issue with one of the hub components due to malformed e-mail traffic on the service, Thompson said in the blog post detailing the BPOS cloud outages. Thompson added that Exchange features a built-in capability to handle malformed traffic but "encountered an obscure case" where that also didn't work correctly, creating a backlog of e-mail. By 3 p.m. Eastern, the malformed traffic was isolated and the mail queues were cleared, but not before customers suffered delays of between six and nine hours. Microsoft created a short-term fix for the issue and went to work on a remedy.

Then, at 12:10 p.m. Thursday, May 13, malformed e-mail traffic was again detected. Microsoft fixed it by 1:03 p.m., but not before users suffered an e-mail delay of up to 45 minutes. And at 2:35 p.m. Eastern Thursday, a second related issue was detected that caused e-mail to become stuck in some users' outboxes. In that issue, more than 1.5 million e-mail messages queued and were awaiting delivery, Thompson wrote. Microsoft fixed that issue by 3:04 p.m. Eastern and the backlog of e-mail messages was 90 percent clear by 7:12 p.m. Eastern, though some users experienced delays of as much as three hours.

To make matters worse, Microsoft BPOS experienced a failure in Domain Name Service (DNS) hosting around 3 a.m. Eastern on Thursday that prevented users from accessing Outlook Web Access hosted in the Americas and partially impacted some functionality of Microsoft Outlook and Microsoft Exchange ActiveSync devices. Microsoft diagnosed and fixed the problem in the DNS servers and restored service by 7:52 a.m. Eastern.

Microsoft's BPOS cloud collapse follows a recent string of high-profile cloud outages.

Last month, Amazon Web Services suffered a massive outage that took many of its customers offline for several days. Amazon later said the outage, which affected its North Virginia data center, was caused by a "re-mirroring storm" in its Elastic Block Store (EBS) service. During the outage, Amazon was criticized for its lack of communication regarding the cloud outage. More than a week after the outage first occurred, Amazon issued an apology to cloud customers and offered many a 10-day credit for the trouble.

NEXT: Recent Cloud Outage Not The First For Microsoft BPOS


See the latest cloud technologies, learn best practices, and interact with your peers at the channel’s first all-inclusive cloud event: NexGen Cloud Conference & Expo, December 4-5, 2014 at the San Diego Convention Center. Register now at  www.NexGenCloudCon.com