Drilling Down Into Dedupe


Data deduplication has become one of the most ubiquitous features of the storage industry, with storage vendors continuing to find new places to add the technology.

It’s a move that at first glance seems odd given that, with deduplication, customers can be counted on to purchase less storage capacity.

Deduplication, also called “dedupe,” removes duplicate information as data is stored, backed up or archived. It can be done at the file level, where duplicate files are replaced with a marker pointing to one copy of the file, and/or at the subfile or byte level, where duplicate bytes of data are removed and replaced by pointers, resulting in a significant decrease in storage capacity requirements.

Logically, dedupe is not the kind of technology storage vendors would be expected to embrace. By adding dedupe to their storage arrays and other appliances, the vendors are actually making it possible for their customers to purchase less capacity.

However, they have no choice. Thanks to the success of pioneers in the technology such as Data Domain (acquired by EMC) and Quantum, both of which have successful lines of stand-alone dedupe appliances, other vendors need to follow suit as customers look for ways to squeeze more efficiency from their storage infrastructures.

Dedupe products can be classified in several different ways.

The first is according to where the dedupe process takes place.

Source dedupe dedupes the data before it is sent across a LAN or WAN. This results in fewer files and less data being sent over the network but can affect the performance of the backup because of the processing overhead caused by the dedupe process. However, with new high-performance processors, this is less of an issue than it has been in the past.

Target dedupe starts the dedupe process after the data is copied onto a destination device such as a virtual tape library. This takes away any overhead related to deduping data at the source but requires more storage capacity at the target to temporarily store the entire data set.

The second is according to when the dedupe process occurs.

With in-line dedupe, files are deduped as they are stored on a device. This adds processing overhead to the dedupe process but does not require extra storage capacity.

With post-process technology, data is sent to the target device to be deduped. This requires extra capacity to temporarily store the incoming files before they are deduped but takes the overhead away from the originating storage device.

Dedupe is a big market. In a recent survey of IT users, research firm IDC found that more than 60 percent of respondents are either already deduping data or plan to do so in the coming year.

Vendors are implementing dedupe in several different ways.

Dedupe originally was developed using stand-alone appliances, mainly by Data Domain, which brought the technology to the mainstream storage market, and Quantum, which got dedupe technology with its acquisition of ADIC.

Mainline and second-tier storage hardware vendors have added dedupe technologies to many of the new midrange and enterprise storage arrays.

Also, data protection software vendors including CA Technologies, CommVault and Symantec offer dedupe. Symantec in 2010 moved from a software-exclusive focus to produce its first hardware appliance based on its dedupe technology.

 

NEXT: Acquiring Dedupe Technology