EMC Greenplum's Pivotal HD Aims To Bring SQL Ease To Hadoop Big Data


 

Yara Scott
Yara Scott

 

EMC Greenplum on Monday unveiled Pivotal HD, a new Apache Hadoop big data distribution that the company said moves Hadoop queries from a batch to an interactive process and makes them as easy as SQL database queries.

Pivotal HD is based on a scalable big data management technology, HAWQ, which Greenplum has been developing for about a decade, said Scott Yara, senior vice president of products and a co-founder of Greenplum, during a Monday webcast introducing the technology.

"Project HAWQ is really the crown jewels of Greenplum," Yara said.

 

[Related: Maritz To Run New Big Data Spinoff For EMC]

The Pivotal HD Hadoop distribution seems to be named after the Pivotal Initiative, a joint venture between storage vendor EMC and cloud technology developer VMware set up in December in which the two parent companies contributed big data and cloud infrastructure technology including Greenplum. The Pivotal Initiative is focused on developing the infrastructure for transforming applications built on and using cloud, mobility and big data technology while allowing VMware to focus on continuing to develop the software-defined data center.

Pivotal HD is now EMC Greenplum's third distribution of Apache Hadoop, the open source big data application for generating business value out of the mountains of data a business can accumulate, including two distributions EMC unveiled about nine months after acquiring Greenplum, Yara said.

The Pivotal HD introduction shows that EMC Greenplum is really "all-in" on Hadoop, Yara said. "Our commitment to Hadoop is strategically the most important thing we are doing as a company," he said.

EMC acquired Greenplum in 2010.

Project HAWQ brings to Hadoop the kind of high-performance queries, high-performance data loading rates, ease of management, and integration with other business intelligence tools that users have come to expect from traditional database tools, Yara said.

Project HAWQ lets EMC Greenplum's Pivotal HD Hadoop distribution scale to thousands of nodes, with the ability to do so in a very elegant fashion, said Josh Klahr, vice president of product management for the company.

Pivotal HD includes the HDFS Hadoop file system at the core, on top of which EMC Greenplum has added three primary technologies, including the Pivotal Command Center, which helps deploy, manage and monitor clusters; a data load mechanism that can load over 100 TB of data per hour; and the Hadoop Virtual Extension, or HVE, from VMware, which allows Hadoop to be aware of both physical environment-aware and virtualized environment-aware, Klahr said.

NEXT: Bringing SQL Ease To Hadoop