EMC Greenplum's Pivotal HD Aims To Bring SQL Ease To Hadoop Big Data

Yara Scott

EMC Greenplum on Monday unveiled Pivotal HD, a new Apache Hadoop big data distribution that the company said moves Hadoop queries from a batch to an interactive process and makes them as easy as SQL database queries.

Pivotal HD is based on a scalable big data management technology, HAWQ, which Greenplum has been developing for about a decade, said Scott Yara, senior vice president of products and a co-founder of Greenplum, during a Monday webcast introducing the technology.

"Project HAWQ is really the crown jewels of Greenplum," Yara said.

[Related: Maritz To Run New Big Data Spinoff For EMC ]

Sponsored post

The Pivotal HD Hadoop distribution seems to be named after the Pivotal Initiative, a joint venture between storage vendor EMC and cloud technology developer VMware set up in December in which the two parent companies contributed big data and cloud infrastructure technology including Greenplum. The Pivotal Initiative is focused on developing the infrastructure for transforming applications built on and using cloud, mobility and big data technology while allowing VMware to focus on continuing to develop the software-defined data center.

Pivotal HD is now EMC Greenplum's third distribution of Apache Hadoop, the open source big data application for generating business value out of the mountains of data a business can accumulate, including two distributions EMC unveiled about nine months after acquiring Greenplum, Yara said.

The Pivotal HD introduction shows that EMC Greenplum is really "all-in" on Hadoop, Yara said. "Our commitment to Hadoop is strategically the most important thing we are doing as a company," he said.

EMC acquired Greenplum in 2010.

Project HAWQ brings to Hadoop the kind of high-performance queries, high-performance data loading rates, ease of management, and integration with other business intelligence tools that users have come to expect from traditional database tools, Yara said.

Project HAWQ lets EMC Greenplum's Pivotal HD Hadoop distribution scale to thousands of nodes, with the ability to do so in a very elegant fashion, said Josh Klahr, vice president of product management for the company.

Pivotal HD includes the HDFS Hadoop file system at the core, on top of which EMC Greenplum has added three primary technologies, including the Pivotal Command Center, which helps deploy, manage and monitor clusters; a data load mechanism that can load over 100 TB of data per hour; and the Hadoop Virtual Extension, or HVE, from VMware, which allows Hadoop to be aware of both physical environment-aware and virtualized environment-aware, Klahr said.

NEXT: Bringing SQL Ease To Hadoop

Project HAWQ makes it possible to run Hadoop with the scalability of traditional databases, Greenplum's Klahr said.

It is SQL-compliant, so users can write any SQL query to let Hadoop mine data across hundreds or thousands of notes, he said. It also brings interactive queries, horizontal scalability, robust data management and deep analytical capabilities to Hadoop.

"There's really nothing like it on the market today," he said.

Greenplum's Yara said EMC Greenplum has been able to add such capabilities to Hadoop because of a long-term commitment the company made to developing Hadoop technology. For instance, he said, Greenplum has strong support from the EMC family of companies including VMware and RSA, and has 300 engineers focused on Hadoop. "We think that's the largest integrated Hadoop engineering team on the planet today," he said.

Along with the introduction of Pivotal HD, EMC Greenplum has been building a partner ecosystem, which includes small startups as well as giants such as Cisco, Intel and SAP, Yara said. However, he said, the technology was developed with such secrecy that those partners only found out about it a couple of weeks ago.