Cloudera Adds Search Capabilities To Its Hadoop Big Data Platform


Cloudera is gearing up to add search technology to its Hadoop-based big data platform in a move the company and its channel partners say will broaden Hadoop's appeal to a wider audience.

The new Cloudera search, based on the open-source Apache Solr search technology, is entering a short public beta period this week before its general release.

The addition of search capabilities to Hadoop is "a big functional leap forward for many of our customers," said Steven Noels, co-founder and products senior vice president at NGData, a Gent, Belgium-based ISV that develops applications, including its Lilly customer intelligence applications, that run on Cloudera's platform.

[Related: Hortonworks Expands Big Data Ecosystem Into Microsoft Windows Arena]

Cloudera markets a distribution of Hadoop, the open-source big data platform developed by the Apache Software Foundation, and develops software products such as Cloudera Manager and Cloudera Enterprise for deploying, configuring and managing Hadoop systems.

Hadoop has a reputation for being complex to work with and generally requires experience with SQL programming -- something that's beyond most information workers. The addition of the Solr technology to Cloudera's Hadoop distribution, CDH, will change that by making Hadoop-stored data available to a broader range of users, according to the company.

"This kind of data is now available for people who don't know SQL, but who do know how to Google their data," said Charles Zedlewski, Cloudera's products vice president. "It opens up data in a Hadoop cluster to a much wider audience of users."

That, in turn, will make it easier for businesses to derive value from their big data stores, he said, as well as reduce data management costs.

Last month Cloudera debuted a production release of its Impala SQL engine that works with Hadoop. That advance combined with today's search announcement help move Hadoop beyond its batch-oriented architecture into more near real-time applications.

Given the rapid adoption and growing use of Hadoop, Zedlewski predicted that in the near future more data would be stored in Hadoop than in any other system.

The new software can search structured and unstructured data stored in the Hadoop Distributed File System and Apache HBase, the latter the open-source database that runs with Hadoop. Like other Cloudera products, Cloudera Search is integrated with Cloudera Manager, the vendor's tool set for managing Cloudera applications.

Noels at NGData calls Cloudera's Hadoop system "the most stable and most mature" on the market. NGData worked with Cloudera to ensure that HBase data can be indexed into Solr and Cloudera Search. That means clients don't have to rely as much on customized systems, Noels said.

While Cloudera Search could be used with other vendors' Hadoop distributions, such as those offered by Hortonworks and MapR Technologies, Zedlewski said the search technology is most tightly integrated with Cloudera's platform.

Cloudera Search has been in use by a handful of companies for several months in a "private beta" program.

PUBLISHED JUNE 4, 2013