Permabit Technology on Monday unveiled new technology aimed at helping other storage vendors add the ability to deduplicate primary storage data with little or no performance impact.
Permabit’s new Albireo data optimization software is an embedded technology that allows other storage vendors to add data deduplication to any storage arrays in any IT environment, said Tom Cook, president and CEO of the Cambridge, Mass.-based company.
Deduplication, also called "dedupe," removes duplicate information as data is stored. It is usually done when data is backed up, as such data changes less often than data on primary storage arrays.
However, when data is deduped for primary storage, there is typically an impact to performance, Cook said. Also, storage array vendors’ features, such as the ability to take snapshots of data for replication purposes, might not work with deduplicated data. Furthermore, deduplication could become a single point of failure in a primary storage array, he said.
“Our solution is to take dedupe out of the data path in order to solve all three of these issues,” he said.
While other companies have developed dedupe technology for primary storage, those solutions typically sit between the data source and the storage arrays, a method Cook said is fundamentally flawed.
“It impacted storage,” he said. “Customers built their workloads on high-speed storage I/O. And vendors have spent millions of dollars on R&D for performance.”
Albireo, on the other hand, lets primary storage array vendors integrate dedupe without impacting performance or feature sets, said Jerred Floyd, CTO and founder of Permabit.
Block or file data is still written directly to the storage device, and at the same time goes through the Albireo application via its API where it is hashed for comparison to up to hundreds of petabytes of storage to look for duplicates in under 17 microseconds, Floyd said.
Next: Implementing Albireo In Three Different Ways
Albireo can be implemented in one of three ways, Floyd said.
When used as inline dedupe processing, it does the comparison to previously stored data before storing the new data. Floyd said there is a slight latency when using Albireo as an inline process. However, he said, it is the easiest way to implement the technology to storage arrays, and so this may be the first to be implemented by OEM vendors.
The second method is post processing. Using post-processing, the original data is copied directly to the target array, after which it goes through the dedupe process.
The most efficient method is parallel processing, in which the dedupe process is applied while data is still in memory before it is sent to the hard disk drives, Floyd said. “But there’s more work to integrate it this way,” he said. “Vendors might not want to start with this technology.”
Permabit is currently working with a number of primary storage vendors to integrate Albireo into their arrays, Cook said. He expects some of those vendors to roll out new arrays with the Albireo technology by late 2010 or early 2011. He declined to name specific vendors.
Albireo’s primary competition is expected to be deduplication technology from NetApp, which dedupes primary storage to NetApp arrays, Cook said.
“They’ve done a good job with it,” he said.