Cloudera Packages Hadoop For Enterprise Implementation

Cloudera is packaging its distribution of the open-source Hadoop software with developer tools, technical support, training programs, and sales and marketing resources to make it easier for enterprises to adopt the technology, said Kirk Dunn, COO of the Palo Alto, Calif.-based company.

"We want to help data-driven enterprises who use data as a critical part of their business to use Hadoop," Dunn said.

The Apache Hadoop project is a framework for running applications on large clusters built using commodity hardware. Hadoop works by breaking an application into multiple small fragments of work, each of which may be executed or re-executed on any node in the cluster.

It includes the Hadoop Distributed File System (HDFS) for reliably storing and analyzing very large files across machines in a large cluster

Sponsored post

While Hadoop is often tied with cloud technologies for analyzing large amounts of data, it is just as likely to be found in use by enterprises looking for ways to better manage their own data stores, Dunn said. For instance, he said, cloud providers like Yahoo, Amazon, and Twitter all use Hadoop internally for their own data.

"Enterprises don't have the volume of data that Web companies have," he said. "But Web companies don't have the diversity of data sources that enterprises have."

Cloudera has figured out how to package Hadoop for the enterprise, which Dunn said is no trivial matter. While Cloudera's Distribution including Apache Hadoop (CDH) is, like all open source software, available for downloading at no charge, the company is offering it as a package to hardware vendors, ISV, and systems integrators, he said.

Cloudera's new hardware partners include Dell, Cisco, Fujitsu, SGI, and Mellanox, while ISV partners include Informatica, Microstrategy, Teradata, and IBM. Cloudera will also offer its package to both regional or boutique integrators as well as larger, national integrators, Dunn said.

"We're creating an ecosystem where hardware and software vendors and system integrators can come together to look at how to use Hadoop to get better insights into data," he said. "We believe that if we can do that, we can enable Hadoop, which is already growing fast, to get even faster adoption."

Hadoop has received much industry support in the last few months especially as businesses grow more interested in new ways to handle "big data," or data which scales to multiple petabytes of capacity and is created or collected, is stored, and is collaborative in real time.

EMC in May unveiled plans to provide full open-source support for Hadoop with the eventual release of software, appliance, and eventually virtual appliance versions of the Hadoop technology in connection with technology it got with last year's Greenplum acquisition.

That same month, NetApp unveiled a new Hadoop storage appliance based on the E5400 storage subsystem it received with its acquisition of Engenio.

Startup server technology developer Calxeda in June unveiled a group of integrator and ISV partners are developing applications around its upcoming ARM processor-based, power-efficient server technology for use in such applications as Hadoop.

Data integration software vendor Informatica in June released a new version of its flagship product designed to handle the "big data" generated by today's transaction processing and social media systems which also supports Hadoop.