Emerging Big Data Vendors To Know In 2020

As part of CRN’s Emerging Vendors for 2020, here are 21 hot big data startups, founded in 2014 or later, that solution providers should be aware of.

The New Generation Of Big Data Companies

Digital transformation initiatives need data to be successful. That is one of the leading drivers behind the efforts at many businesses and organizations to collect, analyze and derive value from the growing volumes of data they collect from their operational systems, their sales and marketing applications, and from outside sources.

But collecting, preparing, integrating, managing and analyzing that data is an increasingly complex task. Data is not only growing in volume, it’s often in a variety of formats and increasingly scattered across on-premises, hybrid and multi-cloud systems. Just identifying what data a company owns can be challenging.

While established vendors like Oracle, Microsoft and Amazon Web Services offer big data products and services, a new generation of companies is developing leading-edge hardware and software products to help businesses and organizations meet their big data challenges.

As part of CRN’s Emerging Vendors for 2020, here are 21 hot big data startups, founded in 2014 or later, that solution providers should be aware of.


Founded: 2015

Top Executive: Haoyuan Li, Founder, CEO

Alluxio’s virtual distributed file system that has its roots in the Tachyon research project at the University of California, Berkeley’s AMPLab. The technology provides a data orchestration layer that brings data close to compute resources for big data and AI/machine-learning workloads in the cloud.


Founded: 2014

Top Executive: David Drai, Founder, CEO

Anodot’s autonomous business monitoring platform uses machine learning to continuously monitor business metrics, detect significant anomalies and help forecast business performance. Anodot‘s algorithms have a contextual understanding of business metrics, providing real-time alerts that help users cut incident costs by as much as 80 percent.


Founded: 2014

Top Executive: Prat Moghe, Founder, CEO

Cazena offers a managed Data Lake as a Service for building data lakes on public cloud platforms. The service includes cloud storage and infrastructure, data and analytics workload engines, workload SLA management and optimization, data ingestion and integration, security and encryption, and governance and compliance.

Cockroach Labs

Founded: 2015

Top Executive: Spencer Kimball, Co-Founder, CEO

Cockroach Labs develops the CockroachDB relational database that’s designed to support next-generation, cloud-native transactional applications. In October 2019 the startup launched CockroachCloud, a fully managed cloud edition of its database. The company has raised $195.1 million in funding.


Founded: 2014

Top Executive: Jay Kreps, Co-Founder, CEO

Confluent’s flagship product, the Confluent Platform, organizes and manages massive volumes of streaming data and makes the data available to business applications and information workers. The event stream processing software is based on Apache Kafka, an open-source stream processing system originally developed by Confluent’s founders at LinkedIn.


Founded: 2015

Top Executive: Thomas Kehler, CEO

CrowdSmart’s prediction platform helps achieve breakthrough discoveries for critical decisions, initiatives and investments. CrowdSmart uses human-empowered AI to automate knowledge acquisition, creating a collective knowledge model from iterative group exchanges—an approach proven to outperform stand-alone AI models and individual human experts.


Founded: 2018

Top Executive: Ryohei Fujimaki, CEO

DotData‘s AutoML 2.0 full-cycle data science automation platform helps enterprises accelerate machine learning and AI projects and deliver more business value by automating the hardest parts of the data science and AI process—feature engineering and operationalization.


Founded: 2015

Top Executive: Billy Bosworth, CEO

Dremio‘s Data Lake Engine delivers fast query speed and a self-service semantic layer operating directly against data lake storage. Dremio eliminates the need to copy and move data to proprietary data warehouses or create cubes, aggregation tables and business intelligence extracts.


Founded: 2016

Top Executive: Adam Famularo, CEO

Erwin’s data governance software platform delivers integrated capabilities for enterprise modeling, data cataloging and data literacy. Erwin facilitates collaboration between IT and business to discover, understand, govern and socialize data both at rest and in motion.


Founded: 2016

Top Executive: Brian Platz, Co-Founder, Co-CEO

The Fluree platform organizes blockchain-secured data in a highly scalable semantic graph database, providing data with native interoperability, provable historic integrity and inherent security. Fluree has gained traction in industries where data needs to be secured at its source and shared across multiple stakeholders.


Founded: 2018

Top Executive: David Flynn, CEO

Looking to overcome the siloed nature of hybrid cloud systems, the Hammerspace Data-as-a-Service platform unifies data management across the hybrid cloud and provides a global file system that serves, manages and protects data wherever it resides.


Founded: 2017

Top Executive: Richard Agee, CFO

IdentityLayer tackles the problem of sensitive data exposure resulting from limited-to-no coordination between data access and user workflow behavior when companies use cloud file sharing services such as Google and Microsoft. The company’s market focus is SMB to midmarket, sold as SaaS exclusively through the channel.


Founded: 2014

Top Executive: Asaf Somekh, Co-Founder, CEO

The Iguazio Data Science Platform enables enterprises to develop, deploy and manage AI applications, transforming AI projects into real-world business outcomes. Organizations can build and run AI models in real time, deploy them anywhere (multi-cloud, virtual private cloud or on-premises) and execute their most ambitious AI-driven strategies.


Founded: 2014

Top Executive: Buno Pati, CEO

The Infoworks enterprise data operations and orchestration system enables digital transformation by automating and accelerating development and orchestration of data and analytics projects at scale. Infoworks uses deep automation and a code-free environment to create analytics pipelines and deploy projects to production.


Founded: 2014

Top Executive: Michael Howard, CEO

MariaDB is the company behind the MariaDB database, a community-developed, commercially supported offshoot of the MySQL relational database. Earlier this year MariaDB launched SkySQL, the company’s cloud Database as a Service for both transactional and business analytics tasks.


Founded: 2015

Top Executive: Amnon Drori, Co-Founder, CEO

Octopai‘s automated business intelligence platform provides automated data lineage, data discovery and business glossary capabilities that enable business intelligence and analytics teams to quickly, easily and accurately find and understand their data.


Founded: 2018

Top Executive: Kaycee Lai, CEO

Enterprises struggle to be data-driven when data is fractured across multiple systems and locations and data management processes lack the agility to deliver answers quickly. Promethium automates the entire DataOps process using augmented data management and AI, accelerating analytics projects.


Founded: 2014

Top Executive: John Randles, CEO

Rooted in academic research and development in information retrieval, distributed computing and knowledge representation, Siren’s investigative intelligence platform combines the capabilities of search, business intelligence, link analysis, and big data operational logging and alerting.


Founded: 2017

Top Executive: Justin Borgman, Co-Founder, CEO

Starburst Data provides a commercial edition (and related services and support) of the Presto high-performance, distributed SQL query engine for finding and analyzing data that resides in a variety of data sources. The company has raised $64 million in two rounds of funding.


Founded: 2015

Top Executive: Ajay Khanna, Founder, CEO

Tellius is an AI-driven business analytics platform that enables anyone to ask questions of their data and discover meaningful insight in seconds. Tellius combines an intelligence layer that automates insight discovery with machine learning, a natural language search interface, and self-service data preparation.

Yellowbrick Data

Founded: 2014

Top Executive: Neil Carson, CEO

Yellowbrick Data was founded by experts in database and flash memory technologies with the goal of simplifying data warehousing. Yellowbrick systems provide high availability and massive scalability, run complex mixed workloads, support ad-hoc SQL, compute correct answers on any schema, and support large numbers of concurrent users.