Quantum Doubles Down On De-Dupe

disk backup

The DXi7500 can be presented to host servers and the SAN as a virtual tape library or as a NAS appliance with Fibre Channel, iSCSI, CIFS, or NFS, and includes integrated data de-duplication, data compression, and data replication capabilities, said Mike Sparkes, product manager for disk systems for the San Jose, Calif.-based vendor.

Virtual tape libraries, or VTLs, are disk arrays configured to look to the host server and the backup software as if they are physical tape libraries. Data is streamed to and recovered from the VTL as if it were tape, so no changes are needed to the backup process. However, because they use hard drives, the backup and recover speed is much higher than when using tape drives. Data backed up to a VTL can also be backed up to a physical tape for archiving or off-site storage.

De-duplication, also called "de-dupe," removes duplicate information as data is backed up or archived. It can be done on the file level, where duplicate files are replaced with a marker pointing to one copy of the file, and/or at the sub-file or byte level, where duplicate bytes of data are removed, resulting in a significant decrease in storage capacity requirements.

De-dupe products can be classified in a couple different ways. The primary difference between them lies in where the de-dupe process takes place.

id
unit-1659132512259
type
Sponsored post

Some products de-dupe the data as it is being sent across a LAN or WAN. Known as in-line de-dupe, this results in fewer files and less data being sent over the network, but can affect the performance of the backup because of the processing overhead caused by the de-dupe process. This is commonly used to replicate data from small remote offices to a central location.

Other products use post-processing de-dupe in which the full data to be backed up is copied onto a destination drive, after which it is de-duped. This mitigates the bottleneck by accepting the full data set and then eliminating duplicates as it is stored, but in this case the customer must have enough storage capacity to temporarily store the entire data set.

The DXi7500 uses both processes, Sparkes said. "This gives customers a choice of which to use, depending on the time of the backup," he said.

For example, customers may only have a one-hour window to back up a large database, in which case they would likely use post-process de-dupe to save time, Sparkes said. For typical LAN data, where backup speed is usually not a bottleneck, customers can use in-line de-dupe, he said.

The DXi7500 also includes an integrated tape creation function, which allows data to be backed up to tape directly from the array without the need to go through a server. Customers can use policies to set which backups should automatically go to tape. Also, when used with Symantec's Veritas NetBackup 6.5 data protection software, a data backup can be directed to disk with an automatic copy made to tape while letting the application see that there are multiple copies of the data, Sparkes said.

The DXi7500 scales to up to 240 Tbytes with a backup performance of up to 8 Tbytes per hour. Sparkes said it was built with high availability in mind, with two servers inside that fail over to each other in case of a problem, hot-swappable components, and no single point of failure.

Kasey Hohenbrink, senior pre-sales engineer at Integrated Archive Systems, a Palo Alto, Calif.-based solution provider, said the DXi7500 finally answers the question of how to get six months of online data backups in a VTL.

"This is aimed at VTL users with high capacity and bandwidth requirements," Hohenbrink said. "Data Domain [of Santa Clara, Calif.] sits well for users wanting to keep 30, 60, or 90 days of backups online. But once past 60 days, they really need a second unit."

Hohenbrink also likes the ability to choose which type of data de-dupe to use, and even to turn de-dupe off and use compression during the backup instead of de-dupe.

"Some things are not de-dupable," he said. "For example, with a write-once data set, you may get 3X to 6X when compressed, but not more. So de-dupe eats up processing power in this case."

The DXi7500 is the latest in Quantum's DXi family of VTLs with built-in de-dupe, which were introduced last December.

While the previous two models were comparable to Data Domain's de-dupe appliances, the DXi7500 offers more scalability and flexibility, Hohenbrink said. However, he said the Data Domain products offer better data replication capabilities.

As far as performance, Hohenbrink said it will be interesting to see if the DXi7500 can really sustain its purported 4 Tbytes per hour. "I doubt it," he said.

The DXi7500 is expected to be available this Fall, Sparkes said.