Page 2 of 8
Valuable indeed. The need to get storage arrays, servers and other boxes out of the corporate data center and turned into a service is growing quickly as businesses large and small face growing mountains of data that threaten to overwhelm their IT management capabilities.
Research firm IDC in its annual "State of the Universe" study estimated that the amount of information created and replicated in 2011 to surpass 1.8 zettabytes, or 1.8 trillion gigabytes, up 900 percent from five years earlier. That data exists in about 500 quadrillion files. By 2015, IDC estimated the amount of data created and replicated will reach nearly 8 zettabytes. The San Diego Supercomputer Center surveyed CIOs and CTOs at 30 large enterprises this year and found that corporate data grew at a median annual compound growth rate of 40 percent, or basically doubling every two years.
If it was a simple question of throwing more people into the data centers, all this growth could be managed with relatively little pain. But that's not going to happen.
IDC estimated that, even as data centers will be managing 50 times, or 5,000 percent, more data over the next decade, they will be doing it with only about 50 percent more people than they now employ.
Handling exponentially more data with limited personnel is only the start. Businesses over the foreseeable future will also be dealing with increasingly more complex data requirements.
The San Diego Supercomputer Center notes that there are at least five different data types based on its persistence, or how long it is kept. These include temporal (transactional data which lasts for as little as a fraction of a second), active (available for immediate use by an application), retained (such as backups, copies, replications), historical (aged data on lower-cost storage devices), and archive (data that may never be accessed but which must be kept for regulatory or compliance purposes, maybe forever).
These disparate data types must be handled using different technologies, ranging from being able to generate data for a transaction and then flush it away to understanding which data should be archived while making sure it can be accessed 30 or more years later if needed.
Another issue complicating management of storage is that the fastest-growing part of the information explosion, unstructured data, also just happens to be the hardest to manage. Unstructured data, which accounts for about 90 percent of the digital information being collected and stored, includes text, audio and video files, photographs, and other data that is not easy to handle using traditional database management tools.
The bigger question is, after the data is collected and stored, now what? That's where the concept of big data, which is technology for real-time analysis of huge amounts of data, comes in, and where assumptions about a company's ability to manage storage with existing tools goes out the door.
Further complicating the issues related to uncontrolled data growth is the fact that so much of that data is actually useless. Deidre Paknad, founder of the Compliance, Governance and Oversight Counsel and director of information life-cycle governance solutions at IBM, wrote in a Forbes magazine article that a survey of businesses found that, at a typical organization, 1 percent of data is on litigation hold, 5 percent is in some form of records, and 25 percent has current business value.
That leaves 69 percent of information in the typical business having no business, legal or regulatory value. At the same, Paknad wrote, IT needs to make a billion choices to determine what part of that data can be safely tossed.