The Coolest Emerging Big Data Companies Of The CRN 2022 Big Data 100

Part 7 of CRN’s Big Data 100 includes a look at the big data technology startups that solution providers should know about.

Big Data, Big Ambitions

The most innovative technologies often come from startups. When they identify a specific need in big data management, processing or analytics, they develop new, leading-edge products that best help customers meet their big data challenges.

This week CRN is running the Big Data 100 list in slide shows, organized by technology category, with vendors of business analytics software, database systems, data warehouse systems, data management and integration software, data science and machine learning tools, and big data systems and platforms.

As part of the 2022 Big Data 100, CRN has compiled a list of 22 emerging big data technology companies founded between 2016 and the present. This list cuts across all technology categories of this year’s Big Data 100 and includes startups developing business analysis, database, data warehouse, data management and integration, and data science and machine learning products.

Some vendors market big data products that span multiple technology categories. They appear in the slideshow for the technology segment in which they are most prominent.

Ahana

Top Executive: Co-Founder, CEO Steven Mih

Ahana offers the Ahana Cloud for Presto, a SQL data analytics managed service based on Presto, the high-performance, distributed SQL query engine for distributed data residing in a variety of sources.

The service, available on the Amazon Web Services platform, is targeted toward data platform engineers who build data lake solutions on AWS S3 storage. In March Ahana added new capabilities to its service that boost data lake governance and security capabilities.

Ahana, which launched in 2020 and is based in San Mateo, Calif., raised $20 million in Series A funding in August 2021, adding to the $4.8 million in seed funding raised in 2020.

Airbyte

Top Executive: Co-Founder, CEO Michel Tricot

Airbyte develops an open-source data integration and ETL platform for replicating and synchronizing data from APIs, files and databases to data warehouses, data lakes and other destinations.

Founded in 2020, startup Airbyte is challenging established data management tech vendors and says its goal is to “make data integrations a commodity” with its open-source extensibility and transparent and predictable compute-based pricing.

Based in San Francisco, Airbyte raised $150 million in Series B funding in December 2021, bringing its total funding to $181.2 million.

Bigeye

Top Executive: Co-Founder, CEO Kyle Kirwan

Delayed, missing, duplicated and damaged data can hinder big data projects and digital transformation initiatives. Bigeye offers a data observability platform that helps data management teams identify and fix data quality problems.

The platform automates data quality management tasks by instrumenting data sets and data pipelines, applying metrics to monitor and measure data quality, detecting data anomalies and alerting data managers when issues occur.

Bigeye, founded in 2019 and based in San Francisco, raised $17 million in Series A funding in April 2021 and then another $45 million in Series B funding in September, financial resources the company is using to accelerate its product development and expand its go-to-market efforts.

Cribl

Top Executive: Co-Founder, CEO Clint Sharp

Cribl’s observability data engineering software, including its flagship Cribl Stream system, is used to build pipelines for routing high volumes of telemetry data, including machine log, instrumentation, application and metric data among operational, storage, analytical and security systems.

In October Cribl launched Cribl Stream Cloud Enterprise Edition, a cloud service for securely managing globally distributed observability data pipelines. The service makes it possible for businesses and organizations to centrally configure, manage, monitor and orchestrate data observability pipeline infrastructure anywhere in the world, according to the company.

Cribl, founded in 2017 and based in San Francisco, raised $200 million in a Series C round of funding in August 2021, resources the company is using to expand its go-to-market efforts—including channel initiatives.

dbt Labs

Top Executive: Founder, CEO Tristan Handy

dbt Labs markets a development framework and tools that data engineers and data analysts use to transform, test and document data in cloud data warehouse systems.

dbt Labs, which launched in 2016, raised $222 million in Series D funding in February—investors included Snowflake and Databricks—bringing the Philadelphia-based company’s valuation to $4.2 billion. The company said the financing was needed to support its rapid growth, including a six-fold increase in revenue over the previous year.

dotData

Top Executive: Founder, CEO Ryohei Fujimaki

The dotData Enterprise data science automation platform allows enterprises to automate data science workflows and build and deploy AI models in days instead of months, according to the company. The system handles data ingestion and wrangling, automated feature engineering, AutoML and model operationalization tasks—all with zero coding.

San Mateo, Calif.-based dotData, a 2018 spinoff from NEC, this week raised $31.6 million in Series B Funding, bringing its total funding to $74.6 million.

Firebolt

Top Executive: Co-Founder, CEO Eldad Farkash

Firebolt provides a cloud data warehouse system with which the startup is boldly competing head-to-head with industry giants including Amazon Web Services, Snowflake and Google.

The company emphasizes the high performance of its system as a competitive advantage. And while AWS and Snowflake market their data cloud platforms for a wide range of tasks, Firebolt is specifically targeting developers and data engineers who are building data-intensive applications and interactive analytical systems in the cloud—for both internal and external users—that tap into huge volumes of data.

In June 2021 Firebolt, which was founded in 2018 and exited stealth in 2020, raised an impressive $127 million in Series B funding to fuel its development efforts.

Fluree

Top Executive: Co-Founder, CEO Brian Platz

Fluree develops its Web3 Data Platform, based on semantic graph database and blockchain technology, which provides data integrity and traceability and facilitates secure data sharing.

Fluree, launched in 2016 and based in Winston-Salem, N.C., has operated the Fluree Partner Network program for ISVs, systems integrators, VARs and cloud infrastructure partners since April 2020.

Kinetica

Top Executive: CEO Nima Negahban

Kinetica develops what it calls an “analytic database for time and space.” The database is designed to work with GPUs and other vector processors to ingest, analyze and visualize massive datasets with trillions of rows of data and process workloads involving data aggregations, graphs and time series.

In September Kinetica, which was founded in 2016 and is based in Arlington, Va., unveiled native integration for its database with Apache Kafka, the open-source event streaming system, and API integration with Confluent’s streaming data platform.

Kyligence

Top Executive: CEO Luke Han

Kyligence offers an AI-enhanced analytics platform that’s capable of delivering sub-second query response time against petabytes of data. The Kyligence system is based on Apache Kylin, a distributed analytics engine for performing multi-dimensional analysis on huge datasets.

The original Kylin technology was developed by the founders of Kyligence, who launched the San Jose, Calif.-based company in 2016 to provide a commercial version of the open-source technology with added capabilities and services. At the system’s core is OLAP functionality that pre-aggregates data in multidimensional indexes, greatly accelerating queries and data analysis.

In March Kyligence debuted Kyligence Managed Services, a fully managed edition of its software with automated operation, consulting services and 24/7/365 support.

Monte Carlo

Top Executive: Co-Founder, CEO Barr Moses

Monte Carlo’s data observability software is used to monitor data across IT systems, including in databases, data warehouses and data lakes, to gauge and maintain data quality, reliability and lineage—what the company calls “data health.”

The startup’s platform evaluates data according to its freshness and how up to date it is, the volume or completeness of data tables, the data schema or organization of the data, data lineage including sources and usage, and the data’s distribution (whether the data’s values are within an accepted range).

Monte Carlo, founded in 2017 and based in San Francisco, raised $60 million in Series C funding in August 2021, financing the company is using to accelerate product development, fuel its go-to-market efforts and promote the data observability concept.

Nexla

Top Executive: Co-Founder, CEO Saket Saurabh

Nexla has developed a unified data operations platform for creating scalable, repeatable and predictable data flows throughout an organization. The software is used to integrate, automate and monitor incoming and outgoing data for data use cases including data science and business analytics.

Nexla’s product portfolio includes Nexsets, which automates manual, time-consuming data engineering tasks, making it easier to access, integrate and transform data that may be scattered across disparate systems and creating what the company calls a “converged data fabric.” Nexsets works by creating logical views of data without the need to copy or duplicate data.

Nexla, founded in 2016, is based in San Mateo, Calif.

Ocient

Top Executive: Co-Founder, CEO Chris Gladwin

The Ocient Hyperscale Data Warehouse transforms and loads massive volumes of data in just seconds, according to the company, and can execute queries on hyperscale datasets up to 50 times faster than other data warehouse systems.

The data warehouse can scale to handle petabytes of data, Ocient says, and delivers low-latency data transformation, streaming and file loading with optimized indexing. The system handles complex analytical functions using industry-standard query and analytics interfaces including SQL, JDBC and ODBC.

Ocient, founded in 2016, also develops analytics software written for specific use cases such as geospatial and operational IT and for vertical industries including financial services, government and telecommunications. Ocient is based in Chicago.

Okera

Top Executive: Co-Founder, CEO Nong Li

DataOps rising star Okera offers a universal data authorization system that audits and authorizes access to data, allowing businesses and organizations to take control of their data security, privacy and regulatory compliance efforts.

The Okera Dynamic Access Platform includes software for building and enforcing data access policies, metadata management, and centralized auditing and reporting tasks.

Founded in 2016, Okera is based in San Francisco.

PlanetScale

Top Executive: CEO Sam Lambert

PlanetScale has developed a highly scalable, serverless, SQL database for deploying and managing large clusters of database instances. The database, which targets developers, uses “sharding,” a technique for scaling a database by spreading data tables across multiple database instances.

The company also offers PlanetScaleDB Cloud, a fully managed, multi-cloud, multi-region Database as a Service.

In November 2021 PlanetScale, founded in 2018 and based in Mountain View, Calif., raised $50 million in Series C funding.

Prophecy.io

Top Executive: Raj Bains, Co-Founder, CEO

Prophecy.io provides a low-code data engineering platform for developing and deploying data pipelines used to manage streams of data for business analytics and machine learning tasks. The system combines visual drag-and-drop development with Agile software engineering practices.

In February Prophecy.io debuted a SaaS-based version of the platform built on Apache Spark, the open-source analytics engine, and the Kubernetes container management system, and running on the Databricks system on AWS, Microsoft Azure and Google Cloud Platform.

Prophecy.io, founded in 2017 and based in Palo Alto, Calif., raised $25 million in Series A funding in January.

Promethium

Top Executive: Founder, CEO Kaycee Lai

Promethium addresses the challenges of self-service data discovery and business analytics with the Promethium Data and Analytics Acceleration offering, which includes augmented analytics, data fabric and data catalog software. The company targets its next-generation collaborative analytics platform toward data teams and business users.

The platform’s natural language processing capabilities are a key component of the platform, and in November Promethium CEO Kaycee Lai was awarded a patent for the company’s NLP data search technology,

In February the Menlo Park, Calif.-based company, launched in 2018, raised $26 million in Series A funding.

Rivery

Top Executive: CEO Itamar Ben Hemo

Rivery offers a fully managed Software-as-a-Service data operations platform that includes data ingestion, transformation, orchestration, reverse ETL and other capabilities. The Rivery technology helps organizations quickly build automated data pipelines using hundreds of prebuilt connectors and pipeline templates.

Founded in 2018 and based in New York and Tel Aviv, Israel, Rivery raised $16 million in Series A funding in March 2021.

Rockset

Top Executive: Co-Founder, CEO Venkat Venkataramani

Rockset offers real-time analytics in the cloud, including the company’s database platform for developing high-performance data analytics applications and dashboards.

The company’s technology indexes structured, unstructured, geographical and time-series data—pulled from OLTP databases, streaming data and data lakes—to process sub-second queries at massive scale.

In September Rockset, founded in 2016 and based in San Mateo, Calif., launched a new release of its database with enhanced enterprise-grade security and compliance capabilities.

Starburst

Top Executive: CEO Justin Borgman

Starburst Data develops Starburst Enterprise, the company’s commercial offering of the Trino open-source, distributed SQL query engine that locates and queries data across distributed data sources. The technology, promoted as an alternative to conventional data warehouses, is capable of analyzing data where it resides without having to move it—a major advantage in cloud and hybrid IT environments with increasingly dispersed data sources.

In 2021 the company launched Starburst Galaxy, a cloud-based, fully managed edition of its analytics platform, and Starburst Stargate, an add-on for Starburst Enterprise for cross-cloud analytics.

In February Boston-based Starburst, founded in 2017, raised $250 million in Series D funding, boosting its valuation to $3.35 billion.

Syncari

Top Executive: Co-Founder, CEO Nick Bonfiglio

Syncari’s no-code data automation platform helps data professionals unify, clean, manage and distribute trusted customer data across an enterprise. The system utilizes a range of data synchronization, unification, governance and access capabilities to perform its tasks.

In June 2021 the company unveiled the addition of sophisticated workflow automation capabilities to help sales and marketing teams make more effective use of customer data.

Syncari, based in San Francisco, was founded in 2019 by former executives from Marketo, Mulesoft and Zendesk. In May 2021 the company announced a $17.3 million Series A round of funding.

Yugabyte

Top Executive: Bill Cook, CEO

Yugabyte has been getting a lot of attention with YugabyteDB, a next-generation, distributed relational database designed to handle huge amounts of data spanning multiple geographic regions and availability zones. The database supports global, business-critical applications—such as in cybersecurity and financial services—that require low query latency and extreme resilience against failures.

In September the company launched Yugabyte Cloud, a fully managed Database as a Service for building cloud-based applications and moving legacy applications to cloud platforms.

In October Yugabyte, founded in 2016, raised $188 million in a Series C round of funding that put the Sunnyvale, Calif.-based company’s valuation at more than $1.3 billion.