EMC on Tuesday updated the operating system of its Isilon scale-out NAS appliance with technology from its Greenplum Hadoop appliance to provide native integration with the Hadoop Distributed File System protocol.
The result, said Sam Grocott, vice president of marketing for EMC Isilon, is the first scale-out NAS appliance which provides end-to-end data protection for Hadoop users and their big data requirements.
"Big data" is data which scales to multiple petabytes of capacity and is created or collected, is stored, and is collaborative in real time. Big data typically consists of unstructured data, which includes text, audio and video files, photographs, and other data which is not easy to handle using traditional database management tools.
The Apache Hadoop project is a framework for running applications on large clusters built using commodity hardware. Hadoop works by breaking an application into multiple small fragments of work, each of which may be executed or re-executed on any node in the cluster. It includes the Hadoop Distributed File System (HDFS) for reliably storing very large files across machines in a large cluster.
"Big data is growing, and getting harder to manage," Grocott said. "Hadoop helps customers understand what's going on by running business analytics against that data."
Hadoop is still in the early adopter phase, Grocott said. "It's Open Source, usually a build-your-own environment," he said. "But we're seeing it move into the enterprise where Open Source is not good enough, and where customers want a complete solution."
While Hadoop is already in common use in big data environments, it still faces several technical limitations which limit customer adoption, said Nick Kirsch, director of product management for EMC Isilon.
Those limitations include a requirement for a dedicated storage infrastructure, thus preventing customers from enjoying the benefits of a unified architecture, Kirsch said. Hadoop data is often at risk because it Hadoop is a single point-of-failure architecture, and has no interface with standard backup, recovery, snapshot, and replication software, he said.
Hadoop implementations also typically have fixed scalability, with a rigid compute-to-capacity ratio, and typically wastes storage capacity by requiring three times the actual capacity of the data for use in mirroring it, he said.
EMC is looking to overcome those limitations by implementing Hadoop natively in its Isilon scale-out NAS appliance, Kirsch said.
"We want to accelerate adoption of Hadoop by giving customers a trusted storage platform with scalability and end-to-end data protection," he said.
EMC Isilon's new OneFS 6.5 operating system with native integration of the Hadoop Distributed File System (HDFS) protocol provides a scale-out platform for big data with no single point of failure, Kirsch said. It also provides end-to-end data protection including all the features of the Isilon appliance, including backup, snapshots, and replication, he said.
The new system also works with all industry-standard protocols, Kirsch said. "This really opens Hadoop up to the enterprise," he said.
Unlike other vendors who have recently introduced Hadoop storage appliances working with third-party Hadoop technology providers, EMC offers a single-vendor solution, Grocott said. "We offer a storage platform natively integrated with Hadoop," he said.
The update to the Isilon operating system to include Hadoop integration is available at no charge to customers with maintenance contracts, Grocott said.
EMC fully intends to support its channel partners with the new Hadoop offering, Grocott said.
"We're early to market," he said. "Our goal is to train our channel partners to offer it on behalf of EMC. Customers trust their channel partners to provide fast implementation and full support."