21 Big Data Products To Check Out At Hadoop Summit

Hadoop Hullabaloo

Drowning in the volume, variety and velocity of Big Data, an increasing number of businesses and their IT managers are turning toward Hadoop and the rapidly expanding list of Hadoop-related technologies for help.

Today marks the start of the 2012 Hadoop Summit in San Jose, Calif., where close to 50 of the most important players in the Hadoop market are showing off their technologies that extend Hadoop's capabilities and make it easier to implement and manage. Whether you're attending in person or just keeping your fingers on the pulse of the Hadoop world, here's the show announcements we think you should know about.


Cloudera will be showing off the recently launched Cloudera Enterprise 4.0 Big Data management platform, as well as the fourth-generation of the company's distribution of the Apache Hadoop software known as CDH4.

The new release of Cloudera Enterprise includes an update to Cloudera Manager with new tools for deploying and managing Hadoop systems, improved management automation of large-scale clusters, and easier integration with a broader range of management tools and data sources.

The CDH4 release offers new high-availability features that eliminates the single point of failure of the Hadoop Distributed File System, increased security that allows more sensitive data to be stored in CDH, and the ability to run multiple data processing frameworks on the same Hadoop cluster.


Dataguise will be launching what it says is the industry's first enterprise-grade data privacy protection and risk assessment application for Hadoop. The new DgHadoop software provides compliance assessment and enforcement for centralized data privacy protection in order to meet privacy compliance regulations and reduce regulatory compliance costs.

Complying with data privacy regulations can be a major challenge because Hadoop collects data from a wide range of sources, not just corporate databases, and concentrating so much disparate data in one system increases the risk of data theft or accidental disclosure.


Datameer will launch Datameer 2.0, a new release of its Big Data analytics software that combines data integration, analytics and visualization into a single package with a spreadsheet interface. While Datameer 1.0 was offered only in an enterprise edition, 2.0 adds workgroup and desktop editions to the Datameer lineup.

The release includes the new Business Infographics Designer for graphics and data visualization design control. The software is built on HTML5 and sports an enhanced user interface, offers support for additional data sources including Facebook and Twitter, and provides improved integration with the Hive data warehouse system for Hadoop.


DataStax is unveiling DataStax Enterprise (DSE) 2.1, a new release of the management system with new capabilities for running a Hadoop cluster across multiple data centers. Built on the Apache Cassandra database, DataStax manages OLTP, analytic and search data in a single database.

The DSE 2.1 release also supports Mahout, the Apache software for building scalable machine learning algorithms, and Oracle Unbreakable Linux. The company is also announcing multiple enhancements to its DataStax OpsCenter visual management and monitoring system for Big Data platforms.


Hortonworks will announce the general availability of the Hortonworks Data Platform 1.0, the company's commercial Big Data platform based on Apache Hadoop that's been in private beta for six months. HDP, according to Hortonworks, offers features such as system monitoring and management, metadata management and data integration services that the company says will make it easier for businesses to adopt Hadoop.

Hortonworks will also announce the addition of high-availability capabilities to HDP based on the use of VMware's vSphere technology. The HS functions include automated namenode failover and failback for the Hortonworks File System, and automated MapReduce detection and response to HDP failover events.


Karmasphere will preview Karmasphere 2.0, a new release of the company's collaborative analytics workspace for Hadoop. The new version simplifies the process of collecting data, statistical models, algorithms and other analytical assets through open APIs. It offers automatic visualization of multi-structured data sets, turning any data type into Hive tables for analysis, according to the company.

Release 2.0 includes a rapid, iterative process for submitting Hive SQL queries natively against Big Data, and automates publishing of analytical insights to users and data analysis teams through traditional business intelligence tools and spreadsheets. The new release, which is available this month through an early access program and will be generally available in July, also provides improved security features that restrict access to data.


Kognitio will be demonstrating its in-memory analytical platform, which can quickly analyze terabytes of data, combined with data visualization software from Advanced Visual System in a move the company described as a convergence of Big Data and cloud computing technologies.

Under the Kognitio-AVS partnership the companies will be targeting their combined technologies toward vertical applications in advertising, consumer behavior and social media.


Solutions integrator Lilien LLC will unveil its end-to-end Big Data and Advanced Analytics product based on the company's Hadoop Starter Cluster reference architecture, a turnkey platform that includes pre-configured computer, networking, software and storage systems.

Lucid Imagination

Lucid Imagination will be demonstrating its recently unveiled LucidWorks Big Data, a cloud-based development system of open-source software for prototyping Big Data applications. Such applications can help businesses analyze unstructured information such as text messages, audio files, email repositories, log files and other content – what Lucid Imagination calls "dark data."

LucidWorks Big Data incorporates Hadoop and other open-source technologies such as Apache Lucene and Solr search, discovery and analytics software; the R programming language for developing analytical applications; and Apache Mahout for building scalable machine learning algorithms.

MapR Technologies

MapR will unveil version 2.0 of its MapR distribution of the Apache Hadoop platform with new advanced monitoring, management and security capabilities. MapR also announced that its software is available as an option for the Amazon Elastic MapReduce service.

Version 2.0 offers new job monitoring and management features, job and data placement controls, multi-tenancy support, central logging and custom central configuration capabilities, enhanced security, new data compression algorithms, support for SUSE Linux, and the latest versions of Hadoop components such as HBase, Hive and Pig.


NetApp is demonstrating the NetApp Open Solution for Hadoop Rack, a pre-configured architecture of computer, networking and data storage technologies that the company said promises faster, more reliable Hadoop deployments. Customers assemble the system using components from NetApp and other vendors and use their choice of data analysis tools. NetApp also is offering professional services for Hadoop deployments.

NetApp also has struck a strategic partnership with Hortonworks under which the two companies are developing and pretesting Hadoop-based solutions based on the Hortonworks Data Platform.


Data analytics application developer ParAccel is unveiling the Hadoop On Demand Integration Module connector software that allows the ParAccel Analytic Platform to access and use Hadoop data.

The connector is now generally available and customers already using the ParAcell Big Data analytic system include Alliance Health Networks and the Web services company Evernote.

Pentaho and Dell

Dell will resell Pentaho's Big Data analytics software as part of the Dell Apache Hadoop Solution and Pentaho is joining Dell's Emerging Solutions Ecosystem program, under an alliance Pentaho will announce at the Hadoop Summit.

The Dell Apache Hadoop Solution includes the computer maker's hardware reference architecture, Crowbar software and Cloudera's distribution of Apache Hadoop. Dell will add Pentaho's analytics and data ETL (extract, transform and load) tools to its Big Data system under the deal.

Pervasive Software

Pervasive will announce the availability of Pervasive Data Integrator v10 – Hadoop Edition, enabling customers to flow their business data to and from Hadoop-based data stores.

The vendor will be demonstrating the software's capabilities, which allow users with a single click to move data from traditional data stores such as DB2, MySQL, Netezza, PostgreSQL, SQL Server, Oracle, Teradata and Vertica directly into HBase, the NoSQL database that comes with all Hadoop distributions.


Qubole will be showing off its cloud-based Big Data platform. The startup company, which just came out of stealth mode, says it's auto-scaling cloud offering eliminates the need for businesses to architect, deploy and manage their own Hadoop clusters.

The company's founders, Ashish Thusoo and Joydeep Sen Sarma, created the Apache Hive data warehousing software that runs with Hadoop and were leaders of Facebook's data infrastructure organization


Savvis will announce an alliance with Hortonworks under which the cloud infrastructure and IT services provider will integrate the Hortonworks Data Platform with the Savvis Symphony suite of cloud services. That, according to Savvis, will simplify the movement of data between the Hortonworks platform – which Savvis will host – and enterprise data systems.


Syncsort will be demonstrating how its DMExpress data integration software is now certified to work with the Hortonworks Data Platform. The links between the two products simplifies and speeds up the movement of data between the HDP and other enterprise systems, according to Syncsort.


New advanced enterprise management capabilities for Apache Hadoop in the Talend Open Studio for Big Data integration software will make it easier for IT organizations to deploy, manage and streamline their Big Data infrastructure, according to the vendor.

Talend Open Studio for Big Data already has more than 450 connectors for linking enterprise data with Hadoop. Talend is adding connectors for HCatalog, a metadata and table management system for data sharing between Hadoop and other systems, and Oozie, a workflow processing system for defining and linking a series of processing jobs.


Data warehousing technology vendor Teradata will show off its new Aster SQL-H technology that connects standard business intelligence applications and Big Data sets stored in Hadoop systems.

The product marks the first time that standard SQL can seamlessly access multi-structured data stored in the Hadoop Distributed File System (HDFS), according to Teradata. Aster SQL-H runs on the company's Teradata Aster MapReduce Appliance.


Vertica, acquired by Hewlett-Packard last year, will showcase the recently unveiled Vertica 6 release of the company's analytics platform software. The new edition expands Vertica's FlexStore architecture for linking the platform to any structured, semi-structured or unstructured data source.

Vertica 6 also supports the R open-source programming language for developing statistical computing and data analysis applications


Businesses can now run the Hadoop system on VMware's vSphere virtualization platform giving the Big Data software a boost in high-availability, elasticity, multi-tenancy and resource sharing, VMware is announcing at the Hadoop Summit.

Under an open-source project it calls "Serengeti," VMware is offering a free download of a toolkit for deploying Apache Hadoop clusters on vSphere 5.0. The toolkit is available under an Apache 2.0 license. The toolkit will support all major distributions of Apache Hadoop, including those from Cloudera and Hortonworks.