2017 Big Data 100: Coolest Emerging Big Data Vendors

The Ones To Watch

While many established vendors made this year's Big Data 100 list, it also includes quite a few startups – companies that are pushing the envelope in developing leading-edge big data technology.

As part of this year's fifth annual Big Data 100, we've included a list of the innovative companies started in 2011 or later that have demonstrated an ability to innovate in bringing to market products and services that help business work with big data.


Top Executive: CEO Satyen Sangani

The Alation data catalog system combines elements of machine learning with human insight to create an inventory of an organization's data assets, helping data-driven businesses more easily find, understand, use and govern their data for making faster and better decisions.

Alation has vigorously pursued alliances with other big data vendors in the last year, integrating its software with products from Trifacta, Tableau, MicroStrategy, Teradata, Hortonworks and others.

Based in Redwood City, Calif., Alation was founded in 2012.

Alpine Data Labs

Top Executive: President and CEO Dan Udoutch

Alpine Data Labs markets Chorus, an advanced, Hadoop-based data analytics platform that's used for analytical tasks in such areas as financial services and healthcare.

San Francisco-based Alpine Data Labs, launched in 2011, is also positioning itself as a player in the Internet-of-things space, providing its technology to manage the expected massive flow of data generated by connected devices.


Top Executive: CEO David Drai

Anodot's software provides real-time analytics and automated anomaly detection, discovering outliers within large volumes of data and turning them into valuable business insights. The software is used in such data-centric businesses such as advertising technology, ecommerce and Internet of things.

Founded in 2014, Anodot is based in Ra'anana, Israel with offices in Europe and Silicon Valley. In less than six months in 2016 the company more than doubled the amount of data analyzed by its proprietary machine learning algorithms to 5.2 billion data points per day.

Arcadia Data

Top Executive: CEO Sushil Thomas

Arcadia Data's Arcadia Converged Analytics Platform combines visual data exploration capabilities with back-end data analytics in a single system that runs natively on Hadoop clusters, helping to overcome the problem of getting value out of Hadoop-stored data.

In March the company debuted Arcadia Enterprise 4.0 with a new user interface and new platform capabilities for building data-centric applications.

The San Mateo, Calif.-based Arcadia Data was founded in 2012. CEO Sushil Thomas, along with other members of the company's startup team, came from such companies as Teradata, Aster Data, IBM and 3PAR. In January the management team was expanded with the hiring of Amir Assar as vice president of sales and Steve Wooledge as vice president of marketing.


Top Executive: CEO Dave Mariani

AtScale's software allows popular business intelligence tools like Tableau and Qlik to access data stored in Hadoop clusters. The technology creates a semantic layer between Hadoop and third-party tools, essentially turning Hadoop into an online analytical processing server that can be tapped for multidimensional analysis.

AtScale, founded in 2013 and based in San Mateo, Calif., was recently awarded a patent for the ability of its calculation engine to run against any BI visualization tool.

In March the company launched AtScale 5.0 with a machine learning performance optimizer, a universal abstraction layer and enterprise-grade security, governance and metadata management capabilities.

Bedrock Data

Top Executive: CEO Thor Johnson

Bedrock Data offers a data integration Platform-as-a-Service that constantly reviews and automatically synchronizes data in IT systems, including cloud-based sales, marketing and support applications. The company says its pre-built connectors eliminate the need for coding to achieve such integrations.

In January Boston-based Bedrock Data, founded in 2012, said it more than doubled its annual recurring revenue in 2016 from both new and existing customers.

BlueData Software

Top Executive: CEO Kumar Sreekanti

BlueData's software, which incorporates Docker's container technology, is used to deploy big data infrastructure and applications in an on-premise model or on AWS. BlueData EPIC (Elastic Private Instant Cluster) is a platform that provides Hadoop-as-a-Service and Spark-as-a-Service.

The company's spring release of BlueData EPIC provides the ability to run big data workloads in hybrid on-premise and public cloud systems.

In January BlueData, founded in 2012 and based in Santa Clara, Calif., said sales grew by 426 percent in 2016 with the addition of such customers as State Farm Insurance, Barclays, Panera Bread.


Top Executive: CEO Eric Tilenius

BlueTalon develops software for data-centric security, user access control and data masking, providing control and visibility at the data layer across Hadoop, Spark, Cassandra, relational databases and other big data systems.

Late last year Dell EMC selected Talon's Policy Engine and Audit Engine software to be the data security and governance component of its Dell EMC Analytic Insights Module big data product.

Founded in 2013, BlueTalon is based in Redwood City, Calif.


Top Executive: CEO Prat Moghe

Cazena's Big Data-as-a-Service moves data processing tasks to the cloud with just a few clicks, automating what is generally a long, complex process. Cazano bundles cloud databases, analytics engines, data movers, security and other tools into a big data Platform-as-a-Service offering that runs on Microsoft Azure and AWS.

The vendor also provides data lake and datamart cloud services, and in February debuted its Data Science Sandbox cloud service for building, testing and running data science analytical applications.

Founded in 2014 and based in Waltham, Mass., Cazena has attracted attention – and financing – because CEO Prat Moghe and board members Jit Saxena and Jim Baum were among the founders of Netezza, a pioneering developer of data warehouse appliances that IBM bought in 2010 for $1.7 billion.

ClearStory Data

Top Executive: CEO Sharmila Mulligan

ClearStory Data's software simplifies access to disparate internal and external data, including corporate and Web information sources such as Hadoop, relational databases and social media. The company's in-memory database "harmonizes" the data and enables interactive data analysis at scale.

In March ClearStory Data, founded in 2011 and based in Menlo Park, Calif., added "automated smart data discovery" capabilities to its system that makes it easier to find and isolate patterns in large, complex data sources.

Gartner tagged ClearStory Data as a leading "visionary" in this year's BI and analytics platforms magic quadrant report.


Top Executive: CEO Jay Kreps

Confluent offers a data platform, based on the Apache Kafka open-source messaging system, for collecting, managing and analyzing streaming data in real time – a growing challenge in the worlds of big data and the Internet of Things.

Confluent launched in September 2014 to provide technology and services that help businesses adopt and use Kafka. The company was co-founded by Jay Kreps, Neha Narkhede and Jun Rao, who created Kafka while working at LinkedIn.

In March Palo Alto, Calif.-based Confluent raised $50 million in Series C funding, bringing its total financing to $81 million.

Continuum Analytics

Top Executive: CEO Scott Collison

Continuum Analytics develops Anaconda, an open-source analytics platform based on the Python programming language that's become popular for building analytical applications. The platform and related consulting services help organizations manage, analyze and visualize massive datasets.

In January Continuum Analytics, founded in 2011 and based in Austin, appointed Collison its new CEO. Co-founder Travis Oliphant, who previously held the CEO post, became president and chief data scientist, focusing his efforts on managing solutions for Continuum's customers and spurring development within the open data science community.


Top Executive: CEO Matt Cain

Couchbase and other vendors in the crowded NoSQL database arena position their products as alternatives to the relational databases that dominate most data centers today. Their next-generation technologies can better handle huge volumes of data and different data types.

Couchbase, founded in 2011 and based in Mountain View, Calif., named former Veritas president Matt Cain to be its new CEO in April, succeeding Bob Wiederhold who became executive chairman.

Couchbase's products include the Couchbase Server and Couchbase Mobile. In March the company reported that it had seen rapid growth in enterprise deployments for Internet of things applications.


Top Executive: CEO Ali Ghodsi

Databricks was founded in 2013 by the creators of Apache Spark, the popular open-source big data processing engine. The San Francisco-based company develops commercial software and services around Spark, including the Databricks Cloud end-to-end hosted data platform that launched in June 2015.

In April Databricks debuted Databricks for Data Engineering, an edition of the Databricks cloud software that data engineers use to combine SQL, ETL, structured data streaming and machine learning workloads running on Spark and move them into production.


Top Executive: CEO Paula Long

DataGravity develops technology that provides a way for organizations to identify sensitive data across their virtual environments and protect it from theft and misuse.

DataGravity, based in Nashua, N.H., started out developing "data-aware" storage systems that could search and govern stored data, but has evolved since its 2012 founding to focus on data protection.

The company says its DataGravity for Virtualization technology offers behavior-based data protection and visibility into unstructured data across virtual environments, allowing users to combat security threats in real time.


Top Executive: Ian Swanson

In September DataScience, founded in 2014, debuted the DataScience Cloud, a collection of tools that data scientists use to find data across a broad range of internal and external sources, develop predictive models that tap into that data, and deploy the applications throughout a company

DataScience is based in the Los Angeles area in Culver City, Calif. The company raised $26.5 million in Series A financing in 2015.


Top Executive: President and CEO Guy Churchward

DataTorrent markets a big data system for unified stream and batch processing that enables users to process, monitor, analyze and act on big data in real time.

In March the company said it had experienced six-fold growth of customers using its software in production, year over year, and 105 percent growth in subscription booking revenue.

DataTorrent, based in San Jose, was founded in 2012 by the creators of Apache Apex, the open-source batch and stream processing engine.


Top Executive: CEO Nir Polak

Exabeam develops a behavior-based security intelligence system that uses advanced analytics and data science for threat detection, data loss protection and security breach investigation and to thwart insider threats.

Exabeam launched its Security Intelligence Platform in January with Log Manager, Advanced Analytics, Incident Responder, Threat Hunter and other tools.

In February San Mateo, Calif.-based Exabeam, founded in 2013, raised $30 million in Series C financing.


Top Executive: CEO Nick Mehta

Gainsight develops a line of business analysis applications used for customer retention tasks including managing the customer lifecycle, identifying cross-sell and up-sell opportunities, and managing customer loyalty and churn risks.

Gainsight was founded in 2011 and is based Redwood City, Calif.


Top Executive: CEO Rob Bearden

Hortonworks offers a range of big data management products built around its Hortonworks Data Platform (HDP), which is itself based on the Apache Hadoop system. It also develops the Hortonworks DataFlow software that collects and analyzes streaming data in real-time.

In April Hortonworks launched HDP version 2.6 with the ability to provide real-time operational analytics using information stored in data lakes.


Top Executive: Osama Elkady

Incorta offers a real-time data analytics platform that aggregates huge volumes in real time in a way the company says makes traditional data warehouse systems obsolete and shortens the time need to develop analytical applications from months to days. Key to the platform's function is its Direct Data Mapping Engine architecture.

Founded in 2013 by a group of former Oracle executives, San Mateo, Calif.-based Incorta officially launched earlier this year with $10 million in Series A funding led by GV (formerly Google Ventures).


Top Executive: CEO Amar Arsikere

Infoworks provides a Hadoop-based data warehouse system that can run either on premise or, more recently, in the cloud.

Infoworks, founded in 2014 and based in San Jose, closed on $15 million in Series B funding in March.


Top Executive: CEO Ann Johnson

Interana markets behavioral analytics software that works with event data, such as how customers behave and how they use a company's product. The software analyzes data generated by Web sites and mobile devices, Internet of Things endpoints and sensors, and call detail records – all focused on improving customer engagement and retention in the digital economy.

Interana was founded in 2013 and is based in Redwood City, Calif. In December the company said it had surpassed user and revenue goals for the eighth consecutive quarter, reporting 135 percent growth in annual recurring revenue.

The company raised $18 million in venture funding in November 2016, bringing its total to $46.2 million.


Top Executive: CEO Eli Singer

JethroData has developed a SQL-on-Hadoop engine that acts as a business intelligence-on-Hadoop acceleration layer that speeds up big data queries from BI tools like Tableau, Qlik and MicroStrategy to any big data source like Hadoop or Amazon S3.

In March the New York-based company, founded in 2012, debuted Jethro 3.0, a release the vendor says reduces costly and labor-intensive data engineering tasks such as pre-aggregating tables, manually building cubes, and managing new and changing applications. Data can be loaded directly into Jethro from Hadoop tables with the 3.0 release, which also sports an enhanced graphical user interface.

Kyvos Insights

Top Executive: CEO Praveen Kankariya

Kyvos Insights develops "OLAP on Hadoop" technology that provides a way to analyze the massive volumes of data businesses and organizations are storing within Hadoop clusters, either through the cloud or on-premise.

In April Kyvos said its flagship product natively supports Google Cloud, joining the product's existing support for Amazon Web Services and Microsoft Azure.

Kyvos Insights was started in 2012 and is based in Los Gatos, Calif.


Top Executive: CEO Frank Bien

Looker's Web-based business intelligence platform provides data exploration and analysis capabilities for data that resides in any relational data source, including on-premises databases or cloud systems like Amazon RedShift and Google Big Query.

Last October the company launched Looker 4 with a full RESTful API for customizing the platform's capabilities and pushing data to wherever it's needed.

On March 30 Looker closed an $81.5 million Series D round of funding led by Capital G, Alphabet's growth equity investment fund, along with Goldman Sachs and Geodesic Capital, along with existing investors. The company, founded in 2011 and based in Santa Cruz, Calif., has raised $177.5 million since 2013.


Top Executive: CEO Babur Ozden

Maana's Knowledge Platform, incorporating its patented Knowledge Graph technology, uses proprietary algorithms that combine human expertise with data to create digital knowledge. The company's forte is applying its technology to assets or business processes to improve profitability.

The Palo A lot, Calif.-based company was founded in 2012.

The company's products have had the greatest success in industrial and energy development sectors with Chevron, Shell and General Electric counted as both customers and investors.

MapD Technologies

Top Executive: Todd Mostak

MapD Technologies' flagship analytics software combines a SQL-compliant, in-memory GPU database with a visual analytics engine to create a system the company says can query and visualize huge volumes of data in milliseconds.

The MapD software is finding its way into a broad range of applications including business intelligence, Internet of things, graphical information systems, social media and server log analytics in a number of vertical industries.

In March MapD, founded in 2013 and based in San Francisco, raised $25 million in Series B funding.

In April the company launched version 3.0 of the company's software with native distributed scale-out capability for deployment across a cluster of GPU machines. The new release also supports high-availability configurations and offers native ODBC client connectivity.


Top Executive: CEO Eric Frenkiel

San Francisco-based MemSQL develops a distributed in-memory database that can process transactions and run analytics in real time using SQL.

In April MemSQL, founded in 2011, unveiled an updated MemSQL release with extended enterprise security features and an advanced security option. The update also included new high-performance data ingest capabilities for the Amazon S3 cloud storage service.


Top Executive: CEO Prakash Nanduri

Paxata's Adaptive Information Platform provides self-service data integration, data quality, semantic enrichment, collaboration and governance capabilities.

Paxata's Spring '17 release provided a number of innovations and enhancements for working with Microsoft Azure cloud systems, and a new InterCloud Connect multi-cloud information system.

Paxata, based in Redwood City, Calif., was founded in 2012.


Top Executive: CEO Ash Munshi

Pepperdata develops software tools for managing Hadoop clusters with hundreds and even thousands of nodes. The technology allows IT to monitor and control system usage to meet service-level agreements, increase data throughput and improve system visibility.

In March Pepperdata expanded its product portfolio with Pepperdata Application Profiler, a DevOps tool that Hadoop and Spark developers use to improve application performance.

Based in Cupertino, Calif., Pepperdata was founded in 2012.

Podium Data

Top Executive: Paul Barth

Podium Data develops the Podium Data Marketplace, a turnkey software system for managing Hadoop-based data lakes – centralized data repositories that combine information from multiple data repositories.

In September Podium Data, founded in 2014 and based in Lowell, Mass., raised $9.5 million in Series A funding.


Top Executive: CEO Ashish Thusoo

Qubole develops the Qubole Data Service, a unified interface that helps users analyze data stored in cloud systems like Amazon Web Services, Google Cloud and Microsoft Azure.

In February the company announced that the Qubole Data Services also works with the Oracle Cloud system.

Qubole, founded in 2011 and based in Santa Clara, Calif., raised $30 million in Series C funding in January.

Redis Labs

Top Executive: CEO Ofer Bengal

Redis Labs markets Redis Enterprise, a high-performance, in-memory NoSQL database for fast transaction processing and real-time analytics. The software is the commercial version of the open-source Redis database.

In 2016 more than 1,300 enterprises adopted the Redis Enterprise platform, bring the global user base to 61,000, including 7,000 enterprise-class customers.

Redis Labs, founded in 2011, is based in Mountain View, Calif.


Top Executive: CEO Manish Sood

Reltio Cloud combines aspects of metadata management and NoSQL graph databases to create a platform for running enterprise data-driven applications and large-scale analytical workloads.

Reltio Cloud 2017.1, released earlier this year, offers new integration, collaboration and globalization capabilities.

Based in Redwood Shores, Calif., Reltio raised $40 million in Series C funding in April.

Snowflake Computing

Top Executive: CEO Bob Muglia

Startup Snowflake Computing began offering its cloud-based Snowflake Elastic Data Warehouse service nearly two years ago, providing an alternative to traditional on-premise data warehouse systems that tend to be complex, expensive and time-consuming to build.

On April 5 Snowflake closed on $100 million in Series D funding, bringing its total financing to $205 million.

The San Mateo, Calif.-based Snowflake, founded in 2012, said that in its fiscal year ended Jan. 31, the company nearly doubled its customer base and increased total customer data storage by 300 percent.

Splice Machine

Top Executive: CEO Monte Zweben

Splice Machine develops an open-source relational database that's powered by Hadoop and Spark technologies, but provides a familiar SQL interface for application developers. The company has emphasized the software's ability to support both transaction processing and analytical processing workloads.

Splice Machine is developing a database-as-a-service offering that will run on the Amazon Web Service.

Founded in 2012, Splice Machine is based in San Francisco.


Top Executive: CEO Mark Terenzoni

Sqrrl has taken advanced data analysis technology developed by the National Security Agency and developed software for big data analysis and cyber security.

Calling itself "the threat hunting company," Sqrrl's software helps organizations target, hunt and disrupt advanced cyber threats. The technology combines user and entity behavior analytics, machine learning and advanced risk scoring with multi-petabyte scalability to detect adversarial behavior.

Based in Cambridge, Mass., Sqrrl was founded in 2012.


Top Executive: President and CEO Ali Kutay

Striim is one of several companies on this year's Big Data 100 list that's addressing the challenge of working with streaming data. The company develops software that combines streaming data integration and streaming operational intelligence in one system, making continuous query/processing and streaming analytics possible.

In April Striim launched version 3.7 of its software with a focus on facilitating real-time, hybrid cloud integration and simplifying the management of applications running on streaming data.

Striim (pronounced "stream" with the "I"s standing for integration and intelligence) is based in Palo Alto, Calif., and was founded in 2012 by former executives from Oracle, Informatica, WebLogic and other big name data management companies.


Top Executive: CEO Nitin Donde

Startup Talena develops data availability management software, combining storage optimization techniques with machine learning to better administer big data management workloads and more accurately predict data availability.

Last month Talena said that over the last 12 months it has seen an eight-fold increase in Cassandra and DataStax Enterprise customers adopting Talena's software for improved backup, recovery and test management, with one petabyte of Apache Cassandra data under its management.

Founded in 2013, Talena is based in San Jose.


Top Executive: CEO Andy Palmer

Cambridge, Mass.-based Tamr developed a data unification system that transforms "dark, dirty and disparate data" from hundreds and even thousands of data sources both inside and outside an organization into clean, connected data.

In March Tamr announced a global reseller agreement with Hewlett Packard Enterprise under which HPE will resell Tamr's data unification product.

Database industry veterans Andy Palmer and Michael Stonebraker started Tamr in 2013.


Top Executive: CEO Ajeet Singh

ThoughtSpot develops search-driven analysis software, utilizing relational search engine technology, which the company says can eliminate the need for complex BI tools.

ThoughtSpot was founded in 2012 and is based in Palo Alto, Calif. In March the company said it experienced 270 percent growth in customers in fiscal 2017 (ended Jan. 31).

In March the company announced the general availability of new embedded analytics capabilities with the company's new Extended Enterprise Edition of its software.


Top Executive: CEO Adam Wilson

Trifacta develops "data wrangling" software for transforming raw, complex data into clean, structured formats for analysis – one of the biggest challenges in big data analysis processes.

Trifacta, founded in 2012 and based in San Francisco, says the company recorded a four-fold increase in sales bookings in 2016 and more than tripled the number of enterprise customers it serves.

Waterline Data

Top Executive: CEO Alex Gorelik

Waterline Data provides a data catalog system that automatically discovers, organizes and surfaces high-quality information scattered across an organization.

In February the company announced the general availability of Smart Data Catalog 4.0, which provides an automated process for metadata tagging that rapidly classifies and organizes a company's data assets and lineage. That makes data more readily available for self-service analytics and data governance tasks.

Founded in 2013, Waterline Data is based in Mountain View, Calif.


Top Executive: President and CEO Pete Cittadini

Wavefront develops a real-time analytics platform that businesses use to monitor and manage the performance of their IT systems, from cloud services, to applications, to networks. Using technology developed internally at Google and Twitter, Wavefront predicts and prevents system downtime and diagnoses root causes of problems in real time.

In April Wavefront, founded in 2013 and based in Palo Alto, Calif., struck a deal to be acquired by VMware for an undisclosed sum. VMware plans to incorporate Wavefront's technology with its cloud-based vRealize product portfolio for monitoring and managing applications and associated IT infrastructure.


Top Executive: CEO Justin Langseth

Zoomdata develops big data analytics and visualization software, based on the company's patented Data Sharpening technique for the fast visualization of large volumes of data.

In April the company said its platform was capable of processing data sets with tens of billions – and even hundreds of billions – of records.

Zoomdata, founded in 2012 and based in Reston, Va., said it experienced 250 percent revenue growth in fiscal 2017.