Email this article   Print article 


Google: Memory Bug Caused Google Docs Cloud Outage

By Andrew R Hickey
September 12, 2011    5:18 PM ET

A "memory management bug" caused Google Docs to suffer an hour-long cloud outage last week that made several Google cloud services unavailable to users.

The Wednesday Google Docs cloud outage, which made Google Document Lists, Google Documents, Google Drawings and Google Apps Scripts inaccessible for the majority of Google Apps users, was caused by a change that had been designed to improve real-time collaboration within the document list. That change exposed a memory management bug, Google Engineering Director Alan Warren wrote in a blog post detailing the Google Docs outage.

"Every time a Google Doc is modified, a machine looks up the servers that need to be updated," Warren explained. "Due to the memory management bug, the lookup machines didn't recycle their memory properly after each lookup, causing them to eventually run out of memory and restart. While they restarted, their load was picked up by the remaining lookup machines -- making them run out of memory even faster. This meant that eventually the servers couldn't properly process a large fraction of the requests to access document lists, documents, drawings and scripts which led to the outage you saw on Wednesday."

Warren wrote that Google's automated monitoring noticed that attempts to access documents were failing at an increased rate and Google was alerted a minute later when the failure rate increased sharply. Once engineers realized the problem was connected to the feature change, they started rolling it back. That occurred 23 minutes after the first alert, Warren wrote. At the same time, Google doubled the capacity of the lookup service to soften the impact of the memory management bog. That rollback completed 24 minutes later and 5 minutes after that the outage was over.

Warren said that Google is scrutinizing the timeline of the Google Docs outage and is putting in place steps to avoid a future cloud outage and decrease the amount of time needed to discover and fix any problems that arise. Google is also working to limit the scope that any single problem can have.

"We intend to take all these steps; some are not easy, but we're committed to keeping Google's services exceptionally reliable," Warren wrote. "In the meantime, rest assured that we take every outage very, very seriously, and as always we'll post a full incident report of what happened to the Apps Dashboard once our investigation is complete. Again, we apologize for the inconvenience and frustration which the outage has caused."

The Google Docs outage last week was the first of a pair of high profile cloud outages. Later in the week -- late Thursday into Friday -- Microsoft Office 365 and some Windows Live online services like Hotmail and SkyDrive were also knocked out of commission. Microsoft blamed that several-hour cloud outage on a DNS issue.

To continue reading this article, please download the free CRN Tech News app for your iPad or Windows 8 device.
Related: Videos | Slide Shows | Comments

SHARE THIS ARTICLE

More Cloud

Recent Articles

10 Intriguing Product Updates From Google I/O 2013

CRN takes a look at some of the key ways Google intends to influence the way we do business and enjoy our free time. A number of product rollouts and updates were made at I/O 2013. Here are the most intriguing.

8 Tips For Successful Cloud Migrations

Successful cloud migrations don't merely focus on changes in technology; they are also focused on the comfort levels of both people who are familiar with the new technology as well as those who might be slightly apprehensive about the forthcoming changes.

9 Key Concerns That Block Cloud Sales

The benefits of the cloud are heavily touted by cloud providers and the various types of channel partners with which they work. But a number of stumbling blocks still remain. Channel partners outlined for CRN some of the objectives they hear most often.

  More Slide Shows




Related Videos
Loading...