How To Tame Unstructured Data

So says Paul O'Brien, CEO of Persist Technologies, a company recently acquired by Hewlett-Packard. That thought is one of the primary motivations behind a new storage movement called information life-cycle management (ILM), a trend that could inspire a new way to store data. No coincidence that this trend coincides with legislation that mandates companies to maintain documents and records for a set time period. Now customers are being forced to re-evaluate where and how data is stored. Analysts and vendors alike believe ILM is the answer. So in the past few months, both EMC and Hewlett-Packard announced acquisitions that will give them some key pieces to that strategy's end. Here is an overview.

EMC Takes Over Documentum
In recent months, EMC has made a couple of back-to-back acquisitions in an attempt to obtain technology that will give it a complete ILM portfolio: First, on Oct. 21, EMC announced it had completed its acquisition of Legato Systems, a backup software company, in a deal valued at $106 million.

Next, EMC bought start-up Documentum in a deal that was completed in mid-December, the same month a third acquisition--of VMware--was announced. Documentum developed software that works like a repository to store "unstructured content," such as XML documents, digital images, PDFs, rich-media content, and fixed and collaborative information. Once in that repository, the information is tagged so that it can be tracked throughout its life span. Policies involving security, accessibility and archiving can be more easily enacted on that data with this software. Moreover, Documentum 5.0 has some smart features, like auditing and tracking data for compliance, as well as a more efficient way to update versions of information.

Tanuja Randery, vice president of global strategic initiatives at EMC, believes that as much as 80 percent of data falls under the unstructured category and that the Documentum acquisition will help EMC "bring structure to the unstructured world." Thus far, some analysts consider EMC a front-runner in the beginnings of this ILM market race.

id
unit-1659132512259
type
Sponsored post

HP Buys Persist
Like Randery, Persist Technologies' O'Brien sees storage not as one big blob of information but data that can be categorized. In talking to CIOs, Persist executives found that 15 years ago, 80 percent of enterprise storage was dedicated to data generated through database applications. "It's what we refer to as dynamic. You touch it many times and change it many times," says O'Brien, whose company is now being acquired by HP for an undisclosed amount. Today, these same CIOs say between 40 percent and 70 percent of their enterprise storage is for reference (or unstructured) data, which is information that "is touched once and looked at many times," he says.

Persist's software, which addresses long-term storage and data accessibility, runs on standard industry hardware to index data--meaning the data is stored based on its attributes and content as opposed to a file name or subject matter. And all the data can be accessed through a Web browser or a standard search engine. To date, Persist's software is built on a couple of e-mail applications, such as Lotus Notes and Outlook, but will be extended to more. And, according to Gartner chief research analyst Carolyn DiCenzo, Persist's software can be used to build a Centera-like device, a product developed by EMC and Network Engines that stores data through an electronic fingerprint as opposed to a file name.

You can expect a lot more publicity around ILM--and not just from EMC and HP. "You know how when SANs came out, every vendor declared themselves a SAN vendor?" DiCenzo asks. "Now everyone is declaring themselves a compliance vendor."