2017 Big Data 100: 35 Coolest Data Management And Integration Vendors
Talk about "big data!" The total amount of digital data worldwide is forecast to explode from 16 zettabytes this year to 180 zettabytes by 2025. That includes data generated by IT systems in banking, retail and other industries; social media and other consumer activities; and the billions of networked devices being connected through the Internet of things.
That poses a major challenge for businesses trying to manage, integrate and analyze all that data in order to operate more efficiently, understand their customers' needs, and identify possible competitive advantages.
As part of the 2017 Big Data 100, we've put together a list of 35 data management and integration technology companies that solution providers should be aware of, offering everything from next-generation database software, to tools for integrating disparate data, to systems for managing live streams of data.
Top Executive: CEO Satyen Sangani
The Alation data catalog system combines elements of machine learning with human insight to create an inventory of an organization's data assets, helping data-driven businesses more easily find, understand, use and govern their data for making faster and better decisions.
Alation has vigorously pursued alliances with other big data vendors in the last year, integrating its software with products from Trifacta, Tableau, MicroStrategy, Teradata, Hortonworks and others.
Based in Redwood City, Calif., Alation was founded in 2012.
Top Executive: CEO Dave Mariani
AtScale's software allows popular business intelligence tools like Tableau and Qlik to access data stored in Hadoop clusters. The technology creates a semantic layer between Hadoop and third-party tools, essentially turning Hadoop into an online analytical processing server that can be tapped for multidimensional analysis.
AtScale, founded in 2013 and based in San Mateo, Calif., was recently awarded a patent for the ability of its calculation engine to run against any BI visualization tool.
In March the company launched AtScale 5.0 with a machine learning performance optimizer, a universal abstraction layer and enterprise-grade security, governance and metadata management capabilities.
Top Executive: CEO Shimon Alon
Attunity develops data integration and replication software used to combine data from disparate sources, making it possible to manage, access and share data across heterogeneous enterprise platforms and cloud systems.
In February, Burlington, Mass.-based Attunity launched Compose 3.0, the latest release of the company's automated data warehouse system with a 10-fold increase in data extract, transform and load processing speeds; and new DevOps capabilities for data warehouse design.
Top Executive: CEO Thor Johnson
Bedrock Data offers a data integration Platform-as-a-Service that constantly reviews and automatically synchronizes data in IT systems, including cloud-based sales, marketing and support applications. The company says its pre-built connectors eliminate the need for coding to achieve such integrations.
In January Boston-based Bedrock Data, founded in 2012, said it more than doubled its annual recurring revenue in 2016 from both new and existing customers.
Top Executive: CEO Jay Kreps
Confluent offers a data platform, based on the Apache Kafka open-source messaging system, for collecting, managing and analyzing streaming data in real time – a growing challenge in the worlds of big data and the Internet of Things.
Confluent launched in September 2014 to provide technology and services that help businesses adopt and use Kafka. The company was co-founded by Jay Kreps, Neha Narkhede and Jun Rao, who created Kafka while working at LinkedIn.
In March Palo Alto, Calif.-based Confluent raised $50 million in Series C funding, bringing its total financing to $81 million.
Top Executive: CEO Matt Cain
Couchbase and other vendors in the crowded NoSQL database arena position their products as alternatives to the relational databases that dominate most data centers today. Their next-generation technologies can better handle huge volumes of data and different data types.
Couchbase, founded in 2011 and based in Mountain View, Calif., named former Veritas president Matt Cain to be its new CEO in April, succeeding Bob Wiederhold who became executive chairman.
Couchbase's products include the Couchbase Server and Couchbase Mobile. In March the company reported that it had seen rapid growth in enterprise deployments for Internet of things applications.
Top Executive: CEO Ali Ghodsi
Databricks was founded in 2013 by the creators of Apache Spark, the popular open-source big data processing engine. The San Francisco-based company develops commercial software and services around Spark, including the Databricks Cloud end-to-end hosted data platform that launched in June 2015.
In April Databricks debuted Databricks for Data Engineering, an edition of the Databricks cloud software that data engineers use to combine SQL, ETL, structured data streaming and machine learning workloads running on Spark and move them into production.
Top Executive: CEO Billy Bosworth
DataStax markets a commercial version of Apache Cassandra, the open-source NoSQL database designed to manage huge volumes of data across multiple data centers and the cloud, as well as providing a line of supporting administration, management, development and analysis tools.
In April DataStax began shipping DataStax Enterprise 5.1, along with DataStax OpsCenter 6.1 and DataStax Studio 2.0. The company said DSE 5.1 provides operational analytics performance that's three times faster than an open-source Apache Cassandra and Spark combination. It also offers simplified management for multi-tenant SaaS applications.
Based in Santa Clara, Calif., DataStax was founded in 2010.
Top Executive: President and CEO Guy Churchward
DataTorrent markets a big data system for unified stream and batch processing that enables users to process, monitor, analyze and act on big data in real time.
In March the company said it had experienced six-fold growth of customers using its software in production, year over year, and 105 percent growth in subscription booking revenue.
DataTorrent, based in San Jose, was founded in 2012 by the creators of Apache Apex, the open-source batch and stream processing engine.
Top Executive: President and CEO Ed Boyajian
EnterpriseDB markets an Oracle-compatible relational database system based on the open-source PostgreSQL database, along with security and performance enhancements, management tools, and other support services.
The database can manage structured and unstructured data in a single database.
In February Bedford, Mass.-based EnterpriseDB announced the general availability of EDB Postgres Platform 2017, offering technical enhancements that support more complex workloads and analytical tasks, and more easily manages larger, multi-terabyte data sets.
Top Executive: CEO Anil Chakravarthy
Informatica is a long-established developer of data management and ETL technologies including tools for master data management, data and cloud integration, and data quality. The vendor's ETL (extract, transform and load) software has long been a key component of many companies' data integration practices.
In March Informatica, based in Redwood City, Calif., debuted Informatica Cloud Data Lake Management, a comprehensive system for managing data lakes in either cloud or on-premise environments. Built on the Informatica Intelligent Data Lake system, the new cloud system supports AWS Redshift and Aurora, and Microsoft's Azure SQL Database and SQL Data Warehouse.
Top Executive: CEO Eli Singer
JethroData has developed a SQL-on-Hadoop engine that acts as a business intelligence-on-Hadoop acceleration layer that speeds up big data queries from BI tools like Tableau, Qlik and MicroStrategy to any big data source like Hadoop or Amazon S3.
In March the New York-based company debuted Jethro 3.0, a release the vendor says reduces costly and labor-intensive data engineering tasks such as pre-aggregating tables, manually building cubes, and managing new and changing applications. Data can be loaded directly into Jethro from Hadoop tables with the 3.0 release, which also sports an enhanced graphical user interface.
Top Executive: President and CEO Gary Bloom
MarkLogic develops an enterprise NoSQL database built with a flexible data model to store, manage, query and search structured and unstructured data and facilitate heterogeneous data integration.
MarkLogic 9, the latest release of the company's software, is currently in beta and boasts significant improvements in data integration and new security capabilities.
MarkLogic, founded in 2001, is based in San Carlos, Calif. The company stunned the industry in 2015 when it received $102 million in Series F financing.
Top Executive: Managing Director Matthew Scullion
Matillion, founded in 2010, develops software that businesses and organizations use to exploit their data that resides in the cloud.
The U.K.-based vendor's two products are ETL for Redshift, a data extract, transform and load tool that works with Amazon Web Service's Redshift hosted data warehouse; and Cloud Business Intelligence, a BI and self-service reporting tool that works with Matillion ETL.
Top Executive: CEO Eric Frenkiel
San Francisco-based MemSQL develops a distributed in-memory database that can process transactions and run analytics in real time using SQL.
In April MemSQL, founded in 2011, unveiled an updated MemSQL release with extended enterprise security features and an advanced security option. The update also included new high-performance data ingest capabilities for the Amazon S3 cloud storage service.
Top Executive: President and CEO Dev Ittycheria
MongoDB develops a NoSQL database that, like competing NoSQL databases, positions itself as an alternative to traditional relational database systems as better able to meet the demands of today's big data environments.
In November MongoDB launched MongoDB 3.4 with new data storage engines and data governance features, capabilities the company expects will extend the software's potential market for enterprise-class applications.
In March MongoDB expanded its OEM Partner Program with new design review and development support programs that help ISV partners embed and deploy the MongoDB database with their applications.
Also in March MongoDB, based in New York and Palo Alto, Calif., began offering a new, free tier for MongoDB Atlas, the company's database-as-a-service offering.
Top Executive: CEO Emil Eifrem
Neo develops the Neo4j graph database, a type of NoSQL database that uses graph theory to map, store and query data relationships. Graph databases are generally considered to work faster with associative data sets and scale more easily to handle large data sets.
Neo was founded in 2007 and is based in San Mateo, Calif. In November the company secured $36 million in Series D financing.
Neo4j gained recognition in 2016 when a consortium of investigative journalists used the database to analyze the "Panama Papers," what some have called the largest data leak expose in history.
Top Executive: CEO Prakash Nanduri
Paxata's Adaptive Information Platform provides self-service data integration, data quality, semantic enrichment, collaboration and governance capabilities.
Paxata's Spring '17 release provided a number of innovations and enhancements for working with Microsoft Azure cloud systems, and a new InterCloud Connect multi-cloud information system.
Paxata, based in Redwood City, Calif., was founded in 2012.
Top Executive: CEO Ash Munshi
Pepperdata develops software tools for managing Hadoop clusters with hundreds and even thousands of nodes. The technology allows IT to monitor and control system usage to meet service-level agreements, increase data throughput and improve system visibility.
In March Pepperdata expanded its product portfolio with Pepperdata Application Profiler, a DevOps tool that Hadoop and Spark developers use to improve application performance.
Based in Cupertino, Calif., Pepperdata was founded in 2012.
Top Executive: Paul Barth
Podium Data develops the Podium Data Marketplace, a turnkey software system for managing Hadoop-based data lakes – centralized data repositories that combine information from multiple data repositories.
In September Podium Data, founded in 2014 and based in Lowell, Mass., raised $9.5 million in Series A funding.
Top Executive: CEO Ashish Thusoo
Qubole develops the Qubole Data Service, a unified interface that helps users analyze data stored in cloud systems like Amazon Web Services, Google Cloud and Microsoft Azure.
In February the company announced that the Qubole Data Services also works with the Oracle Cloud system.
Qubole, founded in 2011 and based in Santa Clara, Calif., raised $30 million in Series C funding in January.
Top Executive: CEO Ofer Bengal
Redis Labs markets Redis Enterprise, a high-performance, in-memory NoSQL database for fast transaction processing and real-time analytics. The software is the commercial version of the open-source Redis database.
In 2016 more than 1,300 enterprises adopted the Redis Enterprise platform, bring the global user base to 61,000, including 7,000 enterprise-class customers.
Redis Labs, founded in 2011, is based in Mountain View, Calif.
Top Executive: CEO Manish Sood
Reltio Cloud combines aspects of metadata management and NoSQL graph databases to create a platform for running enterprise data-driven applications and large-scale analytical workloads.
Reltio Cloud 2017.1, released earlier this year, offers new integration, collaboration and globalization capabilities.
Based in Redwood Shores, Calif., Reltio raised $40 million in Series C funding in April.
Top Executive: CEO Bill McDermott
SAP is a major player in the big data space with such products as the Business Objects line of business intelligence software, HANA in-memory database and application platform, Vora query engine, BW/4HANA data warehouse and other software.
In September SAP acquired Altiscale, a big data startup that developed a cloud version of the Hadoop system for storing, processing and analyzing data. The vendor has renamed Altiscale the SAP Cloud Platform Big Data Services.
Top Executive: CEO Gaurav Dhillon
SnapLogic develops a portfolio of data and application integration products, including connecting enterprise applications with on-premise and cloud-based data, putting it squarely in the middle of the heavily competitive Integration Platform-as-a-Service (iPaaS) arena.
In December SnapLogic landed $40 million in Series F financing, bringing the company's total funding to $136.3 million.
SnapLogic, based in San Mateo, Calif., was co-founded in 2006 by Gaurav Dhillon, former CEO and co-founder of Informatica.
Top Executive: CEO Monte Zweben
Splice Machine develops an open-source relational database that's powered by Hadoop and Spark technologies, but provides a familiar SQL interface for application developers. The company has emphasized the software's ability to support both transaction processing and analytical processing workloads.
Splice Machine is developing a database-as-a-service offering that will run on the Amazon Web Service.
Founded in 2012, Splice Machine is based in San Francisco.
Top Executive: President and CEO Ali Kutay
Striim is one of several companies on this year's Big Data 100 list that's addressing the challenge of working with streaming data. The company develops software that combines streaming data integration and streaming operational intelligence in one system, making continuous query/processing and streaming analytics possible.
In April Striim launched version 3.7 of its software with a focus on facilitating real-time, hybrid cloud integration and simplifying the management of applications running on streaming data.
Striim (pronounced "stream" with the "I"s standing for integration and intelligence) is based in Palo Alto, Calif., and was founded in 2012 by former executives from Oracle, Informatica, WebLogic and other big name data management companies.
Top Executive: CEO Josh Rogers
Syncsort offers a broad range of data transformation and integration products for Hadoop, Microsoft Windows, Linux, mainframes and cloud systems.
In December Syncsort acquired Trillium Software, a developer of data quality tools, in a move that Syncsort said would help customers make better use of their data assets.
Syncsort is based in Pearl River, N.Y.
Top Executive: CEO Nitin Donde
Startup Talena develops data availability management software, combining storage optimization techniques with machine learning to better administer big data management workloads and more accurately predict data availability.
Last month Talena said that over the last 12 months it has seen an eight-fold increase in Cassandra and DataStax Enterprise customers adopting Talena's software for improved backup, recovery and test management, with one petabyte of Apache Cassandra data under its management.
Founded in 2013, Talena is based in San Jose.
Top Executive: CEO Mike Tuchen
Talend develops a range of commercial and open-source software for data integration, master data management, data quality management and other big data tasks.
The Winter '17 release of the Talend Data Fabric, the company's core data integration system, added new self-service data preparation and governance features to help users access, clean and utilize data in massive data sets and data lakes.
Originally founded in Paris, France, Talend today is based in Redwood City, Calif.
Top Executive: CEO Andy Palmer
Cambridge, Mass.-based Tamr developed a data unification system that transforms "dark, dirty and disparate data" from hundreds and even thousands of data sources both inside and outside an organization into clean, connected data.
In March Tamr announced a global reseller agreement with Hewlett Packard Enterprise under which HPE will resell Tamr's data unification product.
Database industry veterans Andy Palmer and Michael Stonebraker started Tamr in 2013.
Top Executive: CEO Adam Wilson
Trifacta develops "data wrangling" software for transforming raw, complex data into clean, structured formats for analysis – one of the biggest challenges in big data analysis processes.
Trifacta, founded in 2012 and based in San Francisco, says the company recorded a four-fold increase in sales bookings in 2016 and more than tripled the number of enterprise customers it serves.
Top Executive: President and CEO David Flower
VoltDB develops an in-memory SQL database that combines streaming analytics with transaction processing capabilities. Businesses use the software to develop business-critical applications that process streaming data the instant it arrives to make immediate decisions.
In March VoltDB appointed Flower, previously the company's chief revenue officer, as president and CEO. His focus will be on increasingly the company's presence in specific markets. Flower replaced Bruce Reading who is now CEO of Pica9, a developer of marketing automation software.
VoltDB was founded in 2009 and is based in Bedford, Mass.
Top Executive: CEO Alex Gorelik
Waterline Data provides a data catalog system that automatically discovers, organizes and surfaces high-quality information scattered across an organization.
In February the company announced the general availability of Smart Data Catalog 4.0, which provides an automated process for metadata tagging that rapidly classifies and organizes a company's data assets and lineage. That makes data more readily available for self-service analytics and data governance tasks.
Founded in 2013, Waterline Data is based in Mountain View, Calif.
Top Executive: Ben Sharma
Calling itself "The Data Lake Company," Zaloni develops software for building and managing data lake systems in much less time than using other technologies and techniques.
The company's product portfolio includes the Bedrock Data Lake Management Platform for data management and governance, and the Data Lake 360 software that provides control of – and visibility into – data lakes.
Zaloni, founded in 2007 and based in Durham, N.C., said it tripled its customer base and product revenue in 2016.