EMC's Greenplum: These 10 People Get Big Data

Big Data's Best

It's not often you see an executive spend a keynote slot talking about another company's innovations. But, that's exactly what Scott Yara, senior vice president of products and co-founder of Greenplum, the big data-focused division of EMC, did during his keynote address at Strata this week.

During his time on stage, Yara spoke not of Greenplum's recent successes, but of the folks outside his company who helped make them happen. He listed off 10 non-Greenplum employees who have helped inspire Greenplum and shape its big data strategy, either by working directly with the company or doing amazing things with big data on their own.

Yara said the list, which was compiled by the data science and technical folks inside of Greenplum, was originally massive, but, for time's sake, was shaved it down to just 10 people. Here they are.

Jake Porway, DataKind

Yara first tipped his hat to Jake Porway, founder and executive director of DataKind, a New York-based team of data scientists that help non-profits and social organizations analyze and store their data more effectively. In January, DataKind announced a new project with Refugees United, an international organization that provides mobile and Web technologies to help refugees locate missing loved ones. DataKind is helping Refugees United leverage its data to learn more about how people are using its website, as well as how users can find one another more effectively.

"This notion of bringing data science and combining that with non-profit organizations and social and civic causes ... and using data science to solve meaningful problems is really, really inspiring work," Yara said.

Jon Kleinberg, Cornell University

Jon Kleinberg, a computer science professor at Cornell University, was the second influencer to make Greenplum's list. "One of our data scientists called him the 'algorithm god,'" Yara said of Kleinberg, whose research focuses on network analysis, social media and algorithm design.

Kleinberg has received research grants from tech giants including Yahoo! and Google, and he has authored several books, including his most recent, "Networks, Crowds, and Markets: Reasoning About a Highly Connected World."

The White House

President Obama secured a spot on the Greenplum list, thanks to the big data initiative he announced in March of last year. The program brings together six Federal agencies that will collectively plunk down more than $200 million to fuel big data R&D.

"The Obama administration has taken a really, really proactive stance to make a real difference in the big data community," Yara said.

Andrew Ng and Daphne Koller, Coursera

Next up was Andrew Ng and Daphne Koller, the co-founders and co-CEOs of Coursera, a social entrepreneurship company that partners with some of the world's top universities to bring online classes to the masses. The courses are free and span a range of topics, including Biology, Mathematics, Computer Science and Humanities.

The idea for Coursera loosely stemmed from Stanford University's online machine learning and databases classes, which Ng helped develop.

"[They're] really bringing data science and technical education to the world," Yara said.

Jonathan Harris, Cowbird

Greenplum also paid tribute to Jonathan Harris, the creator of Cowbird, a virtual story-telling and social media platform on which authors around the world can share their work.

With a Pinterest-like user interface, Cowbird provides a visual feast of author-submitted photos that accompany stories, audio clips and more. There's a "story editor" app that lets users add these components, and then share their completed stories on social media sites like Facebook, Twitter or Stumbleupon. But, Cowbird itself is really a social media site at heart, letting users follow their favorite authors and "Love" their favorites stories, all with the click of a button.

Yara dubbed Harris "the first data scientist that we saw do really inspiring work."

Joe Hellerstein, Trifacta

Big data would be meaningless without humans there to analyze it. That's a concept that's been taken to heart by Joe Hellerstein, another influencer to make Greenplum's list.

Described as a "close advisor" to the Greenplum team, Hellerstein is a computer science professor at University of California, Berkley and CEO of Trifacta, a startup focusing on visualization software and productivity platforms that make it easier for businesses to understand what their data is telling them. Rethinking traditional interfaces and algorithms people use to manipulate data -- and ultimately making them better -- is Trifacta's overall aim.

The company hasn't released any solutions publicly just yet, but Yara said when it does, we're in for a treat.

"[Hellerstein's] about to do some really exciting things, and I'm sure you guys will hear about him over the coming year," he told the crowd.

Ben Werther, Platfora

Ben Werther led the product management team at Greenplum for years. But, even after he left to create his own company -- Platfora -- Yara said his legend still lives on.

Werther and his team at Platfora are striving to turn the BI world on its head through its Hadoop analytics engine it says puts data directly in the hands of business analysts -- without, as the company says, having to deal with "IT friction." Platfora's analytics platform also touts Fractal Cache technology, which bypasses the batch-oriented nature of Hadoop to make for an overall faster data analysis experience.

Todd Papaioannou, Continuuity

Todd Papaioannou was praised by Greenplum as being a trailblazer in the big data application and analytics space. Papaioannou is currently co-founder and CEO of Continuuity, a company that provides a number of solutions to help organizations build apps for big data analytics.

Continuuity offers a tool for every stage of the app-building process, including its Developer Suite to aid the actual building of the app, a sandbox environment for testing that app, and then both a cloud-based and on-premise platform-as-a-service tool for deploying and hosting that app. All of these solutions are based on Continuuity's AppFabric, an elastically scalable cloud platform that is built on top of Apache Hadoop.

Mike Driscoll, Metamarkets

Mike Driscoll, CEO of Metamarkets, was touted by Yara as being "a dear friend, and the first data scientist we probably worked with outside of Greenplum."

Metamarkets is a startup specializing in what it calls "Data Science-as-as-Service," delivering a cloud-based data analytics platform that is purpose-built for big data workloads. The platform is made up of several data management and analytics solutions, including Metamarkets's "Data Pipes," or customized Hadoop pipelines for parallel data processing, along with "Druid," an in-memory data engine that Metamarkets says can slice, dice and drill through data 1,000 times faster than traditional, disk-based databases.

Metamarkets also offers a variety of cloud-based data visualization tools, along with unique, social media-like features that let users easily share dashboards and reports with their teammates.

Steven Hillion, Alpine Data Labs

Last but not least on Greenplum's list was Steve Hillion, who actually founded the analytics group at Greenplum before branching off to start his own analytics company, Alpine Data Labs, in early 2010.

Alpine Data Labs focuses largely on predictive analytics, a phrase that's becoming as buzz-worthy these days as "big data" itself. Alpine Data Lab's platform, which works on both Hadoop-based data sources and traditional relational databases, is meant to bring predictive analytics to masses with a system its maker says is super easy to use, deploy and monitor. It also touts a heavy collaboration angle, meaning team members across the globe can work together on a single predictive analytics model or workflow, thanks to Alpine's built-in Web access.

Alpine Data Labs was featured on CRN's Emerging Vendors list in both 2011 and 2012.