EMC Starts Virtualizing Avamar De-dupe Technology


EMC acquired Avamar last November in a $165 million deal.

De-duplication, also called "de-dupe," removes duplicate information as data is backed up or archived. It can be done on the file level, where duplicate files are replaced with a marker pointing to one copy of the file, and/or at the sub-file or byte level, where duplicate bytes of data are removed, resulting in a significant decrease in storage capacity requirements.

Avamar does de-dupe on the source side, meaning that data which is sent to be backed up or archived first goes through an appliance based on the Avamar software to be de-duped before going to the target device. This results in cutting the amount of bandwidth required to move the data, but may be a bottleneck. Other technologies de-dupe at the target, after the data has been transferred, which requires more storage capacity to hold the data before being de-duped.

Steve Flynn, senior manager of product marketing for Avamar, said Avamar technology finds duplicate data across sites and across nodes in clustered storage. "As a result, all backups are essentially full virtual backups that can be recovered without the need to find incremental changes," he said.

Sponsored post

New on Thursday is EMC Avamar Virtual Edition for VMware Infrastructure, which is the first fully virtualized de-duplication solution for backup and recovery, said Mark Sorenson, senior vice president for information management software at EMC.

It is a virtual server version of the Avamar appliance which is expected to become a certified VMware Virtual Appliance in the near future, Sorenson said. "It enables de-dupe of data across virtual machines, and as data is moved to outside the data store," he said. "The source-based de-dupe can de-dupe at each individual virtual machine, and across all virtual machines, to reduce the total amount of information that is backed up."

Avamar Virtual Edition allows any storage devices that a virtual machine can see, including direct-attached, NAS, and SAN devices, to be backed up, Sorenson said.

Flynn said that there are a couple of situations where virtualized de-dupe can be especially useful. For instance, he said Avamar Virtual Edition is easy to deploy because the software is encapsulated in the virtual appliance, and so backup and disaster recovery processes can use the same virtual infrastructure. And for remote offices, customers can develop a virtual package, then "cookie-cutter" it to multiple remote offices."

Jamie Shephard, vice president of technology solutions at International Computerware Inc. (ICI), a Marlborough, Mass.-based EMC and VMware solution provider, said that the new Avamar technology integrates well with the VMware Consolidated Backup software.

"It lets us start to virtualize the Avamar part of the infrastructure," Shephard said. "You can put de-dupe into a virtual machine, and then backup all changes to virtual machines. EMC is at the forefront of this. No one else can do it."

The next step, a step that Shephard is eagerly awaiting, is for EMC to bring out the Avamar technology as a virtual appliance rather than making it work with virtual servers as is done with Avamar Virtual Edition.

"Avamar will eventually come out with a virtual appliance," he said. "This will eliminate the need for a physical machine for disaster recovery. Nobody wants to failover to disaster recovery with VMware. Instead, all we'll have to do for DR is fire up a new virtual machine."

EMC also unveiled the EMC Avamar Data Store, a packaged solution including de-dupe backup and recovery software running on pre-configured Dell model 2950 servers.

Data Store comes in two versions. The first is a single-node model with 1 Tbyte of storage capacity that can be used to back up between 20 Tbytes and 30 Tbytes of customer data. The second version starts off with four nodes for high availability, along with a spare node and a management node. It expands to up to 16 nodes, for use in enterprise data centers with multiple branch offices.

Virtual Edition is expected to be available in November, with a price consistent with current Avamar software licenses which cost about $17,000 per post-de-duped Tbyte of data. Data Store is now available, with the single-node version costing about $50,000, and the four-node version starting at about $150,000.