The Coolest Emerging Big Data Companies Of The CRN 2021 Big Data 100

Part 6 of CRN’s Big Data 100 includes a look at the big data technology startups that solution providers should know about.

Big Data, Big Ambitions

Businesses and organizations often rely on established IT companies to provide the hardware, software and services they need for their big data initiatives. But some of the most innovative big data technology comes from startup companies, the emerging vendors who have identified a specific need in big data management, processing or analytics and developed leading-edge products to help customers meet a specific big data challenge.

This week CRN is running the Big Data 100 list in slide shows, organized by technology category, with vendors of business analytics software, database systems, data management and integration software, data science and machine learning tools, and big data systems and platforms.

As part of the 2021 Big Data 100, CRN has compiled a list of emerging big data technology companies founded between 2015 and 2020. This list cuts across all technology categories of this year’s Big Data 100 and includes startups who are developing business analysis, database, data management and integration, and data science and machine learning products.

(Some vendors market big data products that span multiple technology categories. They appear in the slideshow for the technology segment in which they are most prominent.)


Top Executive: Co-Founder, CEO Steven Mih

Ahana, which just launched in 2020, is one of several companies building big data systems around Presto, the high-performance distributed SQL query engine for distributed data residing in a variety of sources.

The startup, based in San Mateo, Calif., develops PrestoDB-based ad-hoc analytics software. In late 2020 the company launched the Ahana Cloud for Presto managed service and earlier this year expanded the service with new data lake caching and security capabilities.

Ahana, founded in 2020, raised $4.8 million in seed round funding last year from investors that included GV (formerly Google Ventures).


Top Executive: Founder, CEO Haoyuan Li

Alluxio’s Data Orchestration Platform links data-driven applications, including machine learning and business analytics software, with data sources such as Hadoop-based data lakes, Amazon S3 and Google Cloud Storage that are increasingly dispersed across on-premises, hybrid cloud and multi-cloud IT environments.

On April 16 Alluxio, founded in 2015 and based in San Mateo, Calif., released new Community and Enterprise 2.5 editions of its software with extended API support that boosts the system’s performance and expands its data connectivity range.


Top Executive: Founder, CEO Adrian Knapp

Startup Aparavi’s cloud-based Digital Intelligence and Automation Platform is used to find, classify, automate and govern distributed, unstructured data across on-premises and cloud systems for a range of tasks including data discovery and access, data retention and protection, and data governance, risk and compliance requirements.

Founded in 2017 and based in Santa Monica, Calif., Aparavi initially focused on data management for data backup tasks but realized the potentially broader applications for its technology.

Cockroach Labs

Top Executive: CEO Spencer Kimball

Cockroach Labs develops CockroachDB, a cloud-native, distributed SQL database that’s designed to handle workloads with huge volumes of transactional data. The company’s motto of “scale fast, survive anything, thrive anywhere” (hence the Cockroach name) stems from the database’s elasticity, failure-resistant architecture and multi-cloud flexibility.

Cockroach Labs, founded in 2015, said it more than doubled its revenue and its customer roster in 2020. More than half of its customers are running their critical applications on CockroachCloud, a fully managed cloud instance of CockroachDB that became generally available on AWS and the Google Cloud Platform last year.

In January the New York-based company raised $160 million in Series E funding, bringing its total funding to $355 million and putting its market cap at $2 billion.


Top Executive: Founder and CEO Ryohei Fujimaki

dotData develops what it calls AutoML 2.0 solutions for automating data science workflows. The dotData Enterprise machine learning and data science automation platform handles data ingestion and wrangling, automated feature engineering, AutoML and model operationalization tasks – all with zero coding.

In February dotData, founded in 2017 and based in San Mateo, Calif., launched dotData Cloud, an AI/ML automation platform and services that provides business intelligence teams – especially those within smaller organizations that lack their own data science teams – to quickly automate AI/Ml development tasks.


Top Executive: CEO Billy Bosworth

Dremio develops a next-generation data lake query engine that establishes views into stored data, allowing data scientists and analysts to manage, curate and share data and enabling easy analytics for data consumers.

The company’s software is based on the Apache Arrow open-source technology for developing analytical applications that can process in-memory columnar data.

Dremio, founded in 2015 and based in Santa Clara, Calif., raised $135 million in Series D funding in January.


Top Executive: Co-Founders, Co-CEOs Flip Filipowski and Brian Platz

Fluree develops its Web3 Data Platform, based on semantic graph database and blockchain technology, providing data-centric security and data integrity and facilitating secure data sharing. (Graph databases store data and information about data relationships.)

In September Fluree, founded in 2016 and based in Winston-Salem, N.C., wrapped up the company’s $6.5 million seed funding round. Earlier in the year the company launched the Fluree Partner Network, the startup’s first formal channel partner program.


Top Executive: CEO Tom Addis

Kinetica develops the Kinetica Streaming Data Warehouse that combines historical and streaming data for rapid data analysis. The system can ingest, analyze and visualize massive datasets with trillions of rows of data.

At the core of the system is a memory-first, GPU-accelerated database system that allows the company’s data warehouse platform to unify multiple analytical techniques including relational, geospatial, graph, time series and text search.

In February Kinetica, founded in 2016 and based in San Francisco, said the U.S. Air Force had awarded the company a five-year contract with a $100 million ceiling to deliver a streaming data warehouse for the NORAD and USNORTHCOM Pathfinder program


Top Executive: CEO Luke Han

Kyligence offers an AI-enhanced analytics platform that’s capable of delivering sub-second query response time against petabytes of data. The Kyligence system is based on Apache Kylin, a distributed analytics engine for performing multi-dimensional analysis on huge datasets.

The original Kylin technology was developed by the founders of Kyligence, who launched the San Jose, Calif.-based company to provide a commercial version of the open-source technology with added capabilities and services. At the system’s core is OLAP (online analytical processing) functionality that pre-aggregates data in multidimensional indexes, greatly accelerating queries and data analysis.

In January the company released Kyligence Cloud 4, the first cloud-native release of the Kylin platform.

Last week Kyligence, founded in 2016, announced that it had raised $70 million in Series D funding.


Top Executive: Co-Founder, CEO Chris Gladwin

Startup Ocient develops database and data analytics software that is capable of ingesting, managing and analyzing massive volumes of structured data—multiple petabytes and even exabytes of data, according to the Chicago-based company.

The company’s proprietary Ocient DAS (data analytics solutions) technology includes an ultra-large-scale relational database with trillions and even quadrillions of row and columns. The system also includes analytics software written for specific use cases.

Ocient was founded in 2016. Ocient raised $40 million in Series B funding in January, money the company will use to expand its operations and double its employee head count by the end of this year.


Top Executive: Co-Founder, CEO Amnon Drori

Octopai’s automated data lineage and discovery software helps data managers and data analysts quickly find and understand the data they need for business analytics and other tasks.

The Rosh Haayin, Israel-based company’s metadata management technology helps identify and locate data, wherever it resides throughout an organization. It’s also used to determine data lineage for maintaining data consistency and meeting regulatory and compliance requirements like the European GDPR or California’s CCPA.

Octopai was founded in 2015.


Top Executive: CEO Nick Halsey

Okera, a rising star in the DataOps arena, markets a universal data authorization system that audits and authorizes access to data, allowing businesses and organizations to take control of their data security, privacy and regulatory compliance efforts.

The Okera Dynamic Access Platform includes software for building and enforcing data access policies, metadata management, and centralized auditing and reporting tasks.

In March Okera, founded in 2016 and based in San Francisco, expanded its platform with the ability to delegate data access policy management—an important function for enabling distributed data stewardship.


Top Executive: Co-Founder, CEO Yaniv Leven

Panoply’s cloud data system offers a fast, no-coding-required path to business analytics. The company’s platform performs data connection and integration, data storage management and data access tasks to provide analytics-ready data to where it’s needed.

In October 2020 Panoply, founded in 2015 and based in Tel Aviv, Israel, and San Francisco, raised $10 million, bringing its total financing to $24 million.


Top Executive: Co-Founder, CEO Jiten Vaidya

PlanetScale has developed a next-generation database system based on the Vitess open-source database technology for deploying and managing large clusters of database instances. The database uses “sharding,” a technique for scaling a database by spreading data tables across multiple database instances.

In March the Mountain View, Calif.-based company debuted PlanetScaleDB Cloud, a fully managed, multi-cloud, multi-region Database as a Service running on AWS, Google Cloud Platform and Microsoft Azure. The startup also provides PlanetScaleDB Enterprise for customers that want to run their own Database as a Service. In June 2020 the company launched PlanetScaleDB for Kubernetes, which deploys databases directly into a Kubernetes cluster.

Founded in 2018, the company has raised $25 million in funding, including $22 million in May 2019.


Top Executive: Founder, CEO Kaycee Lai

Promethium addresses the challenges of self-service data discovery and business analytics with its Data Navigation System. The platform uses machine learning algorithms and natural language processing to quickly provide everyday information workers with the data and insight they seek.

The Promethium system connects on-premises and cloud data without moving or copying it and automates data preparation, assembly and visualization tasks.

Promethium was founded in 2018 and is based in Menlo Park, Calif.


Top Executive: Co-Founder, CEO Venkat Venkataramani

Rockset offers a real-time indexing database in the cloud for developing real-time search and data analytics applications. The database works with structured, unstructured, geographical and time-series data—pulled from OLTP databases, streaming data and data lakes—to process sub-second queries at massive scale.

Founded in 2016 and based in San Mateo, Calif., Rockset scored a $40 million Series B round of funding in October 2020, bringing its total funding to $61.5 million.


Top Executive: CEO Justin Borgman

Starburst Data develops Starburst Enterprise for Presto, the company’s commercial offering of the Presto open-source, distributed SQL query engine for finding and analyzing data that resides in a variety of distributed data sources.

Presto is capable of querying data where it resides without having to move it—a major advantage in cloud and hybrid IT environments with increasingly scattered data sources. That makes it a more cost-efficient alternative to traditional data warehouse systems.

Founded in 2017, Starburst raised $100 million in Series C funding in January, bringing the Boston-based company’s total funding to $164 million and its market cap to $1.2 billion. In November 2020 the company launched Starburst Orbit, the company’s first partner program.


Top Executive: Founder, CEO Ajay Khanna

Tellius, founded in 2015 and based in Reston, Va., develops business analytics and AI technology to provide what it calls a “guided insights” platform that helps business users and analytics teams ask questions in natural language and quickly discover business insights—what market researcher Gartner has designated as “augmented analytics.”

At the heart of the Tellius system is the Tellius Genius AI Engine that provides the natural language processing, machine learning and predictive analysis capabilities.


Top Executive: Co-Founder and CEO Ajay Kulkarni

Timescale offers TimescaleDB, a time-series relational database based on the open-source PostgreSQL database. TimescaleDB is specifically developed for ingesting, managing and analyzing time-series or time-stamped data—examples include data from financial trading systems and Internet of Things sensors.

In February TimescaleDB 2.0, a distributed, multi-node, petabyte-scale edition of its database system, became generally available. The new product also offers significant improvements to its continuous aggregates functionality and provides a new user-defined actions feature.

Timescale, founded in 2015 and based in New York, began offering a fully managed, multi-cloud edition of its database in June 2019.