Storage Dedupe Reaches The Mainstream5:00 PM EST Fri. Aug. 15, 2008
Recent moves by top storage vendors to add data deduplication to their product lines have not only brought those vendors in line with smaller competitors who have been offering the feature for years, but have also brought new opportunities to solution providers looking to cash in on one of the fastest-growing parts of the storage market.
Solution providers said that deduplication, which is a way to eliminate redundant files and/or blocks of data, is a must-offer technology because it's now widely available, and customers are finally aware of its importance.
"Customers are taking to dedupe like you wouldn't believe," said Keith Norbie, director of the storage division of Nexus Information Systems, a Plymouth, Minn.-based solution provider. "It's the only thing that rivals VMware in terms of take-up." The push to adopt dedupe is part of a wider move by solution providers to help customers optimize their data centers, Norbie said. "Optimization is prevalent among customers," he explained. "Server virtualization optimizes the number of servers; storage virtualization optimizes the number of storage arrays; and dedupe is important in optimizing the number of disks."
Customers are absolutely buying the idea of dedupe, said Hope Hayes, president of Alliance Technology Group, a Hanover, Md.-based storage solution provider.
"A year or longer ago, people looked at it, maybe for archiving," Hayes said. "Today, if you look at the manufacturers, a lot are coming out with dedupe for disk storage.
Customers now expect it."
A few years ago, there was still a lot of evangelizing solution providers had to do with dedupe, said Mark Teter, CTO of Advanced Systems Group, a Denver-based solution provider. "Now it's definitely being requested by IT in their quest to control costs," Teter said. "IT is trying to leverage every trick to drive out costs. Now we don't have to say anything about dedupe."
Solution providers expect dedupe to become one of the most common parts of their customers' storage infrastructures now that the technology has been adopted by the major vendors. For instance, EMC Corp., Hopkinton, Mass., in May said it will work with Quantum Corp., San Jose, Calif., to OEM Quantum's dedupe technology for its storage line. This is in addition to the technology it got with the acquisition of Avamar nearly two years ago. This year also saw IBM Corp., Armonk, N.Y., offer its first dedupe product after its acquisition of Diligent Technologies in April.
Hewlett-Packard Co., Palo Alto, Calif., this year introduced both source and post-processing dedupe technologies developed internally and in conjunction with a technology deal with Sepaton Inc., a Marlborough, Mass.-based dedupe and virtual tape library developer. Other top storage vendors including Sun Microsystems Inc., Santa Clara, Calif., also work with smaller developers to offer dedupe technology.
Several software vendors have also added dedupe technology to their offerings, including CommVault Systems Inc., Oceanport, N.J.; FalconStor Software, Melville, NY; and Symantec Corp., Cupertino, Calif.
While large and small storage vendors continue to offer dedupe as a feature to their storage technology, others continue to specialize in dedupe. The primary vendor focused specifically on dedupe is Data Domain Inc., Santa Clara, Calif., widely acknowledged as the company that introduced the technology to the mass market.
Because dedupe is Data Domain's only product, the vendor has had to add to its technology in order to maintain its market. In June, for instance, it introduced the ability to lock data stored on its appliances for IT regulatory governance purposes.
Despite the prevalence of dedupe from the vendor side and the growing customer acceptance, there is still a lot of room for savvy solution providers to evangelize its benefits and options.
"There's not a ton of awareness of what the options are," Norbie said. "It's not as hard a sell as, say, professional soccer in the U.S. But customers haven't prioritized on dedupe yet. But I've seen analyst reports, and I expect a boom in the next five years."
About the only area of disagreement between solution providers related to dedupe is the question of whether it is a product or a feature. Teter calls it a feature, citing the introduction late last month of the ability to dedupe across multiple vendors' primary storage by NetApp Inc., Sunnyvale, Calif. "That's a classic example of how it has become a feature," he said.
Hayes said that dedupe is still a product as evidenced by Data Domain's market dominance, but that is quickly changing. "You want dedupe? Just click here," Hayes said. "Give it a couple more years, and it will be more so."
Norbie, however, said dedupe is a product. "Calling it a feature dramatically underestimates that feature," he said.
"EMC, NetApp and others are attempting to cauterize Data Domain's lead by saying that dedupe is a feature, not a product," he said. "But Data Domain's numbers don't show that. ... Calling it a feature dramatically underestimates that feature.
"Congrats to anyone who says dedupe is a feature, not a company. But [Data Domain] is still cashing the checks."
Next: Demystifying Dedupe
Deduplication, also called "dedupe," removes duplicate information as data is backed up or archived. It can be done at the file level, where duplicate files are replaced with a marker pointing to one copy of the file, and/or at the subfile or byte level, where duplicate bytes of data are removed and replaced by pointers, resulting in a significant decrease in storage capacity requirements.
Dedupe products can be classified according to where the de-dupe process takes place. Source dedupe dedupes the data before it is sent across a LAN or WAN. Doing so results in fewer files and less data being sent over the network, but it can also affect the performance of the backup because of the processing overhead caused by the dedupe process.
Post-processing dedupe starts the dedupe process after the data is copied onto a destination device such as a virtual tape library. This mitigates the bottleneck by accepting the full data set and then eliminating duplicates as it is stored, but requires more storage capacity to temporarily store the entire data set.
In-line dedupe uses a separate appliance, which sits between the data source and the data target to dedupe the data as it is moved. This cuts the performance and capacity requirements, but adds a new appliance that needs to be managed.