Page 2 of 3
While dedupe is commonly done on data sent to tape or to virtual tape libraries as a way of cutting the capacity of backup data, NetApp and several smaller startup vendors are offering technology to dedupe primary storage. This use of dedupe technology is still relatively uncommon as customers are concerned about the possible performance hits to storage stemming from the dedupe process.
For many vendors, dedupe technology was the result of acquisitions.
For instance, Quantum acquired ADIC in 2006. That same year, EMC acquired dedupe software maker Avamar and followed that up three years later with its $2.1 billion acquisition of market leader Data Domain.
EMC had to wrest control of Data Domain from archrival NetApp, which gave up its own attempt to acquire the dedupe vendor after a prolonged bidding war. Since then, NetApp has focused on dedupe for its primary storage appliances.
2010 saw a flurry of activity as storage vendors went on a dedupe buying spree, including Dell’s acquisition of Ocarina Networks, EMC’s acquisition of Bus-Tech, and IBM’s acquisitions of Diligent and Storwize.
Solution providers said that, despite the variety of ways customers can use dedupe technology, the stand-alone hardware appliance still has a strong appeal over software or array alternatives.
Using stand-alone appliances has the greatest potential for increasing dedupe performance, said Keith Norbie, vice president of sales at Nexus Information Systems, a Minnetonka, Minn.-based solution provider.
A dedupe appliance from manufacturers such as Data Domain depends on the processor and memory performance to dedupe data and push it through to disk, and so they become the bottleneck, Norbie said. “No matter how fast the disk is, it all goes through the CPU and memory,” he said. “The good news is, the faster the Intel processor is, the faster the performance.”
The performance of appliances or storage arrays for dedupe depends on the number of spindles, or physical hard drives, Norbie said. “To increase performance, you need to add spindles, even if you already have excess capacity,” he said.
The big choice for best dedupe performance, then, comes down to technology that is CPU-bound or spindle-bound, Norbie said. “And for now, CPUs are sizzling,” he said.
Another important factor often overlooked in dedupe performance is the granularity of the process, Norbie said. Companies such as Data Domain and Quantum offer variable block size dedupe, which automatically adjusts the size of the blocks of data that can be examined for duplications.
With processing power the key factor in determining dedupe performance, the stand-alone appliance is still the primary choice, said Michael Spindler, data protection practice manager at Datalink, a Chanhassen, Minn.-based solution provider.
Spindler said solution providers should not dwell too much on the difference in performance of appliances from different vendors.
“We’ve recently seen that Quantum, because of the increase in the Intel performance and number of cores in the process, was able to double the dedupe performance over its previous models,” he said. “This gives it better performance than Data Domain now. But six to eight months later, Data Domain will get faster.”
Good dedupe appliances also make it easier to automate the dedupe process, Spindler said. “In the midterm backup space, say 5 TB to 20 TB, the appliance really offers more of a set-it-and-forget-it process,” he said.
Spindler said in his experience, variable block dedupe gives a boost in terms of capacity reduction, but not a significant one. For instance, he said increasing the dedupe data compression rate from 10:1 to 12:1 may only give back about 4 percent of disk space.
NEXT: Taking Advantage Of Dedupe