2014 Big Data 100: Big Data Infrastructure, Tools And Services

Big Data Infrastructure, Tools And Services

Businesses continue to struggle with exploding volumes of data, not just manage it to keep IT systems from being overwhelmed, but to find ways of deriving value from all that data and somehow use it for competitive advantage.

With that in mind the CRN editorial team has created the second annual Big Data 100 list, identifying vendors that have demonstrated an ability to innovate in bringing to market products and services that help business work with big data. Here are 25 big data infrastructure, tool and service companies offering everything from hardware servers to software platforms and applications to cloud-based services.


Top Executive: Founder, CEO Raymie Stata

Altiscale says its Altiscale Data Cloud is the first cloud service that's purpose-built to run Hadoop, offering an on-demand, pay-as-you-go service based on the big data platform.

Stata, formerly the CTO at Yahoo, founded Altiscale in 2012. The company, based in Palo Alto, Calif., recently received $12 million in Series A funding.

Amazon Web Services

Top Executive: CEO Jeff Bezos

Amazon Web Services has been steadily expanding the range of big data-related services it offers. Today they include Amazon DynamoDB NoSQL database service, Amazon Kinesis managed service for real-time processing and analysis of streaming big data, Amazon Redshift petabyte-scale data warehouse, Amazon Glacier for archival big data storage, and Amazon Elastic MapReduce providing the Hadoop framework through Amazon's Elastic Compute Cloud (EC2) service.

Amazon also hosts a number of other vendors' products including Revolution Analytics' Revolution R Enterprise and SAS' Visual Analytics platform.

CA Technologies

Top Executive: CEO Michael Gregoire

CA Technologies, Islandia, N.Y., provides a number of IT system capacity management tools to help businesses and service providers better predict and plan the IT resources needed to handle big data demands. The vendor's application and network performance management tools also come into play, helping manage big data in IT operations.

CA, however, is no longer in the data modeling business after it sold its ERwin data modeling software to Embarcadero earlier this year.


Top Executive: President, CEO John Morris

Cleversafe developed what it calls a limitless data storage service based on object storage technology and "information dispersal algorithms coupled with encryption" that expand, virtualize, transform, slice and disperse data across a network of storage nodes.

Founded in 2004 and based in Chicago, Cleversafe has also combined its object-storage dispersal technology with the capabilities of Hadoop MapReduce, bringing its storage technology to Hadoop-based analytics.


Top Executive: CEO Tom Reilly

Cloudera Enterprise is the vendor's distribution of the Hadoop platform, coupled with system management (Cloudera Manager) and data management (Cloudera Navigator) tools. Launched in 2008, Cloudera is based in Palo Alto, Calif.

On March 31 Cloudera closed on a whopping $900 million financing round, followed a few days later with the general release of Cloudera Enterprise 5. And later in April the company added resources, training and certification to its Cloudera Connect Partner Program.


Top Executive: CEO Gary Nakamura

Concurrent, founded in 2008, offers application middleware technology that businesses use to develop, deploy, run and manage big data applications. The company's products include the Cascading application development framework and Driven application performance management software.

Last month the San Francisco-based company debuted Cascading 3.0 with support for local in-memory, Apache MapReduce and Apache Tez. Concurrent closed on $4 million in Series A funding last year.


Top Executive: Co-Founder, CEO Jonathan Gray

One problem with Hadoop is the shortage of skilled developers capable of building applications that leverage its capabilities. Continuuity offers the Continuuity Reactor development engine that Java programmers use to build and deploy cloud-based big data applications.

Continuuity, founded in late 2011 and based in Palo Alto, Calif., shipped Continuuity Reactor 2.0 in October with MapReduce scheduling, resource isolation, high availability and full support for REST APIs.


Top Executive: Co-Founder, CEO Manmeet Singh

One of the challenges of big data is securing all that information. Dataguise, founded in 2007, develops big data security intelligence and protection software targeting organizations in health care, financial services, government and other industries. The vendor’s automated discovery, data masking, encryption and risk assessment software runs within Hadoop and other big data environments.

In April the Fremont, Calif.-based company established the Big Data Protection Partner Program to recruit solution providers working on big data projects.


Top Executive: Founder, CEO Michael Dell

Dell has been steadily adding to its big data offerings in recent years with its Dell Boomi AtomSphere data integration software, the Kitenga Analytics Suite big data search and business analytics platform, and all the database management tools that came with its 2012 acquisition of Quest Software. And, of course, the company's core business is providing all the underlying server hardware and data storage systems to pull it all together.

In March Dell, Round Rock, Texas, expanded into the realm of predictive analytics by acquiring StatSoft and its flagship software Statistica Enterprise.


Top Executive: CEO Joseph Tucci

EMC is, of course, a leading maker of large-scale storage systems such as the Isilon and Symmetrix lines, which by itself makes it a major player in the big data arena. But the Hopkinton, Mass., company is also deeply into the analytics and applications side of the big data equation with its Pivotal joint venture with VMware.

Pivotal, for example, offers the Pivotal Chorus analytic productivity platform for searching, exploring and visualizing data. Also in the product lineup is the Pivotal Greenplum MPP database, the Pivotal HD distribution of Hadoop with SQL query services, and the Pivotal Gemfire distributed data management platform.


Top Executive: President, CEO Larry Warnock

Gazzang, founded in 2010, is another startup that’s addressing the problem of securing big data. The company’s technology provides data security in big data and cloud computing environments, securing personally identifiable information, preventing unauthorized access to sensitive data and systems, and helping organizations comply with data security regulations.

Earlier this month Austin, Texas-based Gazzang unveiled a data encryption and key management system for OpenStack Swift, the open-source cloud technology stack’s object storage platform.


Top Executive: President, CEO Meg Whitman

Like Dell and other system vendors, Hewlett-Packard markets a range of server, storage and other hardware and system software products that form the foundation of big data systems. At the same time it has been expanding its lineup of higher-level software for specific data management and analysis applications.

For HP, Palo Alto, Calif., the latter is focused on the company's Haven big data platform with its Vertica massively scalable database and Autonomy IDOL search and content management software.


Top Executive: CEO Rob Bearden

Hortonworks, launched in 2011, offers the Hortonworks Data Platform, a system based on Apache Hadoop combined with tools for data management, integration, security, provisioning and other software for enterprise data processing.

Earlier this month Palo Alto, Calif.-based Hortonworks acquired XA Secure, a startup developer of security and governance technology for the Hadoop big data platform. By adding XA Secure's technology to the Hortonworks Data Platform, the vendor strengthens its competitive hand against other Hadoop distributors.


Top Executive: President, CEO Ginni Rometty

IBM has products that span all facets of big data, including business analytics tools such as Cognos and SPSS, data management software such as its DB2 database and InfoSphere data integration system, and hardware platforms such as IBM PureData powered by Netezza technology and the Watson supercomputer.

Just this week IBM, Armonk, N.Y., expanded its business intelligence lineup with IBM Concert, a cloud-based decision support and collaboration application, and IBM Project Catalyst, a new analytic discovery and visualization tool.

MapR Technologies

Top Executive: Co-Founder, CEO John Schroeder

MapR Technologies competes with Cloudera, Hortonworks and other vendors in the Hadoop arena, building on its distribution of Hadoop and other open-source Apache software to create a complete big data platform for both operational and analytical purposes.

San Jose, Calif.-based MapR, founded in 2009, reported this month that first-quarter bookings tripled over last year's first quarter. The company now supports Hewlett-Packard's Vertica Analytics software running on the MapR platform.


Top Executive: CEO Satya Nadella

Microsoft has been growing its big data software lineup in recent years. At the platform level the company offers its venerable SQL Server database with built-in business intelligence capabilities, as well as its Azure HDInsight Hadoop-based service.

Microsoft, Redmond, Wash., offers Power BI for Office 365, a set of business intelligence tools for its popular Office 365 cloud applications. For more everyday use the company provides Power Query, a self-service BI tool that works with the ubiquitous Excel spreadsheet.


Top Executive: CEO Tom Georgens

NetApp is best known for its data storage hardware and software systems, a foundation for any big data system.

But the Sunnyvale, Calif.-based manufacturer also offers a number of higher-level products for tackling big data tasks. The NetApp Data Warehouse includes Snapshot for copying data, SnapRestore for ETL chores and SnapManager for data management. And the NetApp Open Solution for Hadoop is a ready-to-deploy Hadoop cluster system for running big data analytics apps.


Top Executive: CEO Larry Ellison

Oracle is best known for its namesake relational database software that remains a foundational component for many IT organizations' data management and analysis systems. But Oracle today offers the entire big data stack, from the Exadata Database Machine and Big Data Appliance hardware, to data integration tools, and data warehousing and business intelligence software.

Showing that it's not stuck in the relational world, Oracle, Redwood Shores, Calif., recently unveiled Oracle NoSQL Database 3.0, a new release of the company's distributed key-value database with improved security, data center performance enhancements, and developer support for tabular data models.


Top Executive: CEO Paul Maritz

Pivotal is the big data joint venture between storage giant EMC and VMware. The goal, according to the company, is creation of software applications that leverage "big and fast data" on a single, cloud-independent platform.

Pivotal develops the Pivotal Chorus analytic productivity platform for searching, exploring and visualizing data. Also on the product line card is the Pivotal Greenplum MPP database, the Pivotal HD distribution of Hadoop with SQL query services, and the Pivotal Gemfire distributed data management platform.


Top Executive: Co-Founder, CEO Ashish Thusoo

Qubole develops a Hadoop-based big data platform, the Qubole Data Service, which runs in the cloud. Last month the company added to its lineup Facebook's Presto-as-a-Service query engine that provides real-time SQL capabilities on Hadoop.

Expectations are high for this Mountain View, Calif.-based company: Founders Ashish Thusoo and Joydeep Sen Sarma built and ran Facebook's data service and scaled it to more than 25 petabytes. They also created the Apache Hive open-source data warehouse technology.


Top Executive: Co-Founder, CEO Tim McIntire

StackIQ markets software used to automate the deployment, provisioning and management of big data IT infrastructure, including Hadoop and NoSQL database clusters. The software supports the Cloudera, Hortonworks and MapR Hadoop distributions.

The La Jolla, Calif. company was founded in 2006.


Top Executive: President, CEO Mike Koehler

Teradata is another company that could lay claim to the title "original big data company," having debuted its hardware/software data warehouse systems back in the 1980s (the Dayton, Ohio-based company was founded in 1979). Today the company supplies a broad range of products from the Teradata Data Warehouse Appliance, to database software, to marketing and analytic applications.

Last month the company launched Teradata QueryGrid, which it called "the industry's most complete big data analytic solution." The software executes a single SQL query across multiple analysis engines and databases.

Treasure Data

Top Executive: Founder, CEO Hiro Yoshikawa

Treasure Data offers a cloud-based data warehouse (data analytics Platform-as-a-Service) that operates on a subscription model. The idea is to provide sophisticated data warehouse capabilities to businesses without the huge costs and development times associated with on-premise systems.

The Mountain View, Calif.-based company was founded in 2011 and launched its service in 2012. Last year the company raised $5 million in Series A financing.


Top Executive: Co-Founder, CEO Yaniv Mor

Xplenty offers a cloud-based data integration service running on Hadoop, providing an alternative to using on-premise data ETL (extract, transform, load) tools to integrate structured and unstructured data. The startup is also competing against Amazon's Elastic Map Reduce Service.

The company, based in Tel Aviv, Israel, with a U.S. office in San Francisco, was founded in 2011.


Top Executive: President, CEO Jim Vogt

Zettaset is the creator of Orchestrator, software that businesses use to manage and secure their Hadoop big data clusters. The Mountain View, Calif.-based company was launched in 2009.

In February Zettaset said it had been granted a patent for its "split brain resistant fail-over" in high availability technology for Hadoop, a core component of Orchestrator.