2015 Big Data 100: Infrastructure, Tools And Services

Big Data Infrastructure, Tools And Services

Tools for business analytics and data management are critical in our big data world. But those software technologies are only as good as the supporting hardware and software technologies and platforms they run on.

The CRN editorial team has created the third annual Big Data 100 list, identifying vendors that have demonstrated an ability to innovate in bringing to market products and services that help businesses work with big data.

Here are 25 big data infrastructure, tool and service companies offering everything from hardware servers, to software platforms and applications, to cloud-based services. Some, such as IBM, Hewlett-Packard and Oracle, have broad product lines that also include analytics, data management and infrastructure technologies for tackling big data.

Altiscale

Top Executive: Founder and CEO Raymie Stata

Altiscale is one of several competing startups that provide Hadoop-as-a-Service. The company's Altiscale Data Cloud is an on-demand, pay-as-you-go service based on the Hadoop big data platform.

In February, Altiscale announced that the Apache Spark big data processing engine was available on the Altiscale Data Cloud, and that Kerberos authentication security had been built into the platform.

Stata, formerly the CTO at Yahoo, founded Palo Alto, Calif.-based Altiscale in 2012. In December, the company raised $30 million in Series B funding.

Amazon Web Services

Top Executive: CEO Jeff Bezos

While Amazon Web Services is best known as a cloud service for storing data, the company has been steadily expanding the range of big data-related services it provides. They include the Amazon DynamoDB NoSQL database; Amazon RDS relational database; Amazon Kinesis service for realtime processing and analysis of streaming data; the Amazon Redshift petabyte-scale data warehouse; Amazon Glacier for archival data storage; and Amazon Elastic MapReduce, which provides the Hadoop framework through Amazon's Elastic Compute Cloud (EC2) service.

For the first time, Amazon in April broke out revenue figures for AWS, revealing that the cloud service provider generated $1.57 billion in sales in the first quarter, and $265 million in profits.

BlueData Software

Top Executive: Co-Founder and CEO Kumar Sreekanti

BlueData Software emerged from stealth mode in September, debuting its BlueData EPIC software platform that uses virtualization technology to make it easier, faster and more cost-effective for businesses to leverage big data by deploying Hadoop-as-a-Service in an on-premises model.

The company, founded in 2012 and based in Mountain View, Calif., said its technology makes it possible for businesses of all sizes to quickly build big data systems with cost savings of 50 percent to 75 percent compared to traditional approaches.

Cask

Top Executive: Founder and CEO Jonathan Gray

Cask is an open-source software company that provides development tools for Hadoop applications and data. The Cask Data Application Platform is used to build, deploy and manage big data applications.

In February, Cask inked a strategic business and technology collaboration agreement with Hadoop distributor Cloudera -- a deal that included a Cloudera equity investment in Cask.

Founded in 2011 as Continuuity, Cask is based in Palo Alto, Calif.

Cloudera

Top Executive: CEO Tom Reilly

Cloudera is one of the leading providers of Hadoop and related software and services. The company's flagship Cloudera Enterprise suite includes tools for Hadoop administration, data governance and security. The company also developed Cloudera Impala, a massively parallel processing SQL engine for data stored in Hadoop clusters.

Cloudera, founded in 2008 and based in Palo Alto, Calif., said in February that its partner ecosystem had grown by more than 75 percent in the previous year to 1,450 companies, including systems integrators and solution providers (more than 850 at the time), major IT vendors, ISVs and tool developers.

Concurrent

Top Executive: CEO Gary Nakamura

Concurrent positions itself as a leading supplier of infrastructure software that businesses use to develop, deploy, run and manage big data applications. Products include the Cascading application development framework and Driven application performance management software.

The San Francisco-based company was founded in 2008.

Confluent

Top Executive: Co-Founder and CEO Jay Kreps

Startup Confluent is developing a commercial streaming data platform based on Apache Kafka, the Apache Software Foundation's open-source message broker software.

Kreps, along with Confluent co-founders Neha Narkhede and Jun Rao, were instrumental in Kafka's development before moving on to build the data infrastructure at LinkedIn. The trio left in November to start Mountain View, Calif.-based Confluent and raised $6.9 million in seed funding.

DataGravity

Top Executive: Co-Founder and CEO Paula Long

After two years of development, DataGravity last debuted its DataGravity Discover Series of "data-aware" storage appliances that not only help businesses manage their data, but provide search-and-discovery capabilities to help them understand how the data is being used. The company was co-founded by Paula Long and John Joseph, key executives behind storage technology developer EqualLogic.

Founded in 2012 and based in Nashua, N.H., DataGravity raised $50 million in series C financing in December, bringing its total financing to $92 million.

Dataguise

Top Executive: Co-Founder and CEO Manmeet Singh

One of the challenges of big data is securing such huge volumes of information. Dataguise, founded in 2007, targets its DgSecure data security intelligence and protection software toward organizations in health care, financial services, government and other industries. The vendor’s automated discovery, data masking, encryption and risk-assessment software runs within Hadoop, NoSQL databases and other big data environments.

In February, the Fremont, Calif.-based company debuted its DgSecure for NoSQL technology at the Strata + Hadoop conference.

Dell

Top Executive: Founder and CEO Michael Dell

Dell has been steadily building its lineup of big data software, including the Boomi AtomSphere data integration software, and the Statistica business intelligence and predictive analytics software, the latter it acquired when it bought StatSoft in March 2014. Dell also offers a range of big data management tools that came with its 2012 acquisition of Quest Software.

EMC

Top Executive: CEO Joseph Tucci

EMC's data storage systems, including the Isilon and VMAX lines, by themselves position the company as a major player in the big data space. But the company is also deeply into the analytics and applications side of the big data equation. Its Pivotal joint venture with VMware markets the Greenplum massively parallel processing database, the HAWQ SQL engine for Hadoop and the GemFire in-memory distributed database.

In March, the EMC Federation (made up of the storage vendor along with RSA, VMware and Pivotal) unveiled a big data hardware and software system called the Federation Business Data Lake, which makes it easier for businesses to collect, house and analyze huge volumes of data. The system is made up of products from EMC's Information Infrastructure line, as well as VMware vCloud Suite, Pivotal Big Data Suite and Pivotal Cloud Foundry.

Hewlett-Packard

Top Executive: President and CEO Meg Whitman

Like IBM, Dell and other system vendors, Hewlett-Packard markets a range of server, storage, and other hardware and system software products that form the foundation of big data systems. At the same time, it has been expanding its lineup of higher-level software for specific data management and analysis applications.

HP's big data and analytics offerings are collectively marketed as the Haven big data platform, which includes Hadoop, the Vertica columnar database, and Autonomy unstructured data search and analysis software.

Hortonworks

Top Executive: CEO Rob Bearden

Hortonworks, launched in 2011, offers the Hortonworks Data Platform, a distribution of Apache Hadoop combined with tools for data management, integration, security, provisioning and other software for enterprise data processing.

Palo Alto, Calif.-based Hortonworks went public on Dec. 11, 2014, and in February reported that revenue for its fiscal year ended Dec. 31 was $12.7 million.

In April of this year, Hortonworks announced a deal to acquire SequenceIQ and its rapid deployment tools for Hadoop. The same month, the company named Scott Gnau, most recently president of data warehouse technology developer Teradata, as its new chief technology officer.

IBM

Top Executive: President and CEO Ginni Rometty

IBM has products that span all facets of big data, including business analytics tools such as Cognos and SPSS; data management software such as its DB2 database and InfoSphere data integration system; and hardware platforms such as IBM PureData powered by Netezza technology and the Watson supercomputer.

In February, IBM unveiled a version of its BigInsights analytics platform for Apache Hadoop. It includes a data science toolset to query, visualize and explore large volumes of Hadoop data.

MapR Technologies

Top Executive: Co-Founder and CEO John Schroeder

MapR Technologies competes with Cloudera, Hortonworks and other vendors in the Hadoop arena, building on its distribution of Hadoop and other open-source Apache software to create a complete big data platform for both operational and analytical purposes.

San Jose, Calif.-based MapR Technologies launched MapR Distribution, including Apache Hadoop 4.1 in February, with new asynchronous replication and other capabilities that support realtime applications for globally distributed data.

The company is reportedly considering a late-2015 IPO.

Microsoft

Top Executive: CEO Satya Nadella

Microsoft has been growing its big data software lineup in recent years. At the platform level, the company offers its widely deployed SQL Server database with built-in business intelligence capabilities, as well as its Azure HDInsight Hadoop-based service. Microsoft's Power BI for Office 365, a set of business intelligence tools for its Office 365 cloud application set, has been gaining in popularity.

At its Build conference in late April, Microsoft debuted new Azure big data services including Azure SQL Data Warehouse, which the company touted as an easier way to set up a cloud-based data warehouse, and Azure Data Lake, for storing and managing an "infinite amount of data."

Oracle

Top Executives: CEOs Mark Hurd and Safra Catz

CEOs Hurd and Catz officially became the top executives in September when Co-Founder Larry Ellison relinquished the CEO title to become chairman and CTO. It's a good bet, though, that he's still setting the technology direction for the database giant.

Oracle's relational database remains the company's flagship product. But the vendor offers a deep stack of big data technology from hardware such as the Exadata Database Machine and Big Data Appliance, to NoSQL and in-memory databases, business intelligence and advanced analytics software, and analytical applications.

In April, Oracle unveiled Oracle Data Integrator for Big Data, the latest product stemming from the company's strategy of developing technologies that enable Hadoop, NoSQL and relational database technologies to work together in on-premise or cloud environments.

Pepperdata

Top Executive: Co-Founder and CEO Sean Suchter

Startup Pepperdata has developed a realtime cluster optimizer for Hadoop that monitors and controls all hardware usage (CPU, disk I/O, memory and networks). That helps IT departments better manage jobs running on Hadoop and get the most out of their Hadoop deployments.

Founded in 2012 and based in Sunnyvale, Calif., Pepperdata raised $15 million in Series B financing in April.

Pivotal

Top Executive: CEO Paul Maritz

Pivotal is the big data joint venture between storage giant EMC and VMware. Pivotal's mission is to create software applications that leverage "big and fast data" on a single, cloud-independent platform.

Pivotal's product lineup includes the Greenplum massively parallel processing database, the HAWQ SQL engine for Hadoop and the GemFire in-memory distributed database.

In an unusual step, Pivotal in February announced that it was open-sourcing some of its products, including the Greenplum and GemFire databases and its Pivotal HD Hadoop distribution. That announcement came as part of Pivotal's role in forming the Open Data Platform consortium.

Qubole

Top Executive: Co-Founder and CEO Ashish Thusoo

Qubole is one of several startups that offer a big data Hadoop-as-a-Service platform. The Qubole Data Service runs on Amazon AWS, the Google Compute Engine and Microsoft Azure.

In February, Qubole, founded in 2012 and based in Mountain View, Calif., added the Apache Spark processing engine to its QDS platform, broadening the types of workloads that analysts and data scientists can run on QDS.

Before starting Qubole, founders Ashish Thusoo and Joydeep Sen Sarma built and ran Facebook's data service and scaled it to more than 25 petabytes. They also created the Apache Hive open-source data warehouse technology.

Snowflake Computing

Top Executive: CEO Bob Muglia

Snowflake Computing officially launched in October, debuting its cloud-based data warehousing services the startup is positioning as a more flexible, easier-to-manage alternative to traditional on-premise data warehouse systems. It's also competing with other cloud data warehouse offerings such as Amazon Web Service's Redshift and Google's Big Query.

The San Mateo, Calif.-based company, founded in 2012, has gained a lot of visibility because its CEO is former Microsoft and Juniper Networks executive Bob Muglia. The service is currently being used by a number of beta customers and is expected to be generally available by midyear.

Sqrrl

Top Executive: CEO Mark Terenzoni

Sqrrl's founders came from the supersecret National Security Agency and helped develop that organization's massive database. The Sqrrl Enterprise database offers column, graph and document store capabilities to power big data applications. The product's real forte is its ability to scale up and provide data security at the cell level.

Sqrrl, founded in 2012 and based in Cambridge, Mass., originally targeted its technology for more general big data analytical applications. But in the last year, the company has focused its technology for detecting and investigating cybersecurity threats.

Sqrrl raised $7 million in Series B funding in February.

Syncsort

Top Executive: CEO Lonne Jaffe

Syncsort began in 1968 developing software for mainframe computers. Under former IBM and CA Technologies executive Jaffe, the company has been reinventing itself as a provider of big data integration and transformation tools for Hadoop and other platforms.

In February, the Woodcliff Lake, N.J.-based Syncsort debuted a new release of its DMX data integration product suite with design capabilities that support multiple compute frameworks. The company said those capabilities make it easier for businesses to adopt and deploy Apache Hadoop.

Teradata

Top Executive: CEO Mike Koehler

Teradata is another company whose roots go way back before the term "big data" was coined, having developed its hardware/software data warehouse systems back in the 1980s. Today the company supplies a broad range of products, including the Teradata Data Warehouse Appliance and Teradata Aster Discovery Platform, as well as a broad portfolio of analytical applications.

Teradata is based in Dayton, Ohio (it was once owned by NCR Corp.) In April, Teradata unveiled the Teradata Data Warehouse Appliance 2800, a system optimized for fast in-memory analytical processing and increased query throughput. The company also launched a software-defined data warehouse, an enhancement to the Teradata Database that lets businesses consolidate multiple data warehouses into one system.

Treasure Data

Top Executive: Co-Founder and CEO Hiro Yoshikawa

Treasure Data offers a cloud-based data warehouse (data analytics Platform-as-a-Service) that operates on a subscription model. The idea is to provide sophisticated data warehouse capabilities to businesses without the huge costs and development times associated with on-premise systems.

The Mountain View, Calif.-based company was founded in 2011 and launched its service in 2012. In January, the company raised $15 million in Series B financing. The company will use the funds to continue developing its technology for SQL access, and analysis of huge volumes of big data from mobile, web and Internet-of-Things sources.