The Coolest Emerging Companies Of The 2023 Big Data 100

Part 7 of CRN’s 2023 Big Data 100 includes a look at the big data startups that solution providers should know.

Big Data, Big Plans

Startup companies are often the source of the most innovative technologies. Unencumbered by legacy products – or legacy thinking – startups can move quickly to develop new, leading-edge products when they identify a specific need in big data management, processing or analytics.

This week CRN is running the Big Data 100 list in a series of slide shows, organized by technology category, spotlighting vendors of business analytics software, database systems, data warehouse and data lake systems, data management and integration software, data observability tools, and big data systems and cloud platforms.

As part of the 2023 Big Data 100, CRN has compiled a list of 14 startup big data technology companies founded between 2017 and the present. This list cuts across most technology categories of this year’s Big Data 100 and includes companies developing new technology in business analytics, databases, data warehouse and data lake platforms, data management and integration tools, and data observability software.

Some vendors market big data products that span multiple technology categories. They appeared in the slideshow for the technology segment in which they are most prominent.

Ahana

Co-Founder and CEO Steven Mih

Founded: 2020

Ahana develops Ahana Cloud for Presto, a software-as-a-service data analytics service based on Presto, the open-source SQL query engine used to query data in a range of data sources including database and data lakehouse systems.

Venture-backed Ahana, founded in 2020 and based in San Mateo, Calif., was acquired by IBM earlier this month. With the acquisition IBM joined the Presto Foundation, part of the Linux Foundation.

Airbyte

CEO Michel Tricot

Founded: 2020

Startup Airbyte is one of the more recent entries in the data integration and transformation arena, developing a data integration engine and ETL (extract, transform and load) platform for replicating data between applications, databases, data warehouses and other systems.

Airbyte provides an open-source edition of its software along with the commercial Airbyte Enterprise and Airbyte Cloud editions. The San Francisco-based company also offers out-of-the-box connectors to hundreds of data sources and targets, a connector development kit, and several sales and marketing analytics tools.

Astera Software

CEO Ibrahim Surani

Founded: 2017

Astera develops Astera Data Stack, a code-free, unified data management platform that provides data integration, transformation and management capabilities along with tools for data warehousing, EDI (electronic data interchange) management and API lifecycle management.

Based in Westlake Village, Calif., Astera works with reseller and system integrator partners.

Bigeye

CEO Kyle Kirwan

Founded: 2019

Bigeye’s data observability system monitors the health of data pipelines and the quality of the data they contain to maintain data reliability and trustworthiness. The platform automates data quality management tasks by instrumenting data sets and data pipelines, applying metrics to monitor and measure data quality, detecting data anomalies and alerting data managers when problems arise.

In addition to the core data monitoring capability, the Bigeye platform performs actionable root cause and impact analysis for data lineage issues, and measures data deltas during data migration and replication operations.

San Francisco-based Bigeye raised $17 million in Series A and $45 million in Series B funding rounds in 2021.

CelerData

CEO James Li

Founded: 2020

Startup CelerData markets a high-performance unified analytics platform based on the StarRocks massively parallel processing SQL database for real-time analytics. CelerData’s founders developed StarRocks in 2020 and earlier this year contributed it to the Linux Foundation.

In March CelerData, headquartered in Menlo Park, Calif., took aim at the fast-growing data lakehouse space with a new release of its software with a cloud-native architecture, real-time streaming analytics, and support for open data table formats Hudi, Iceberg and Delta Lake.

Cribl

CEO Clint Sharp

Founded 2017

Cribl develops its products for IT operations and security managers whose systems generate machine data that can be collected, monitored and analyzed to maintain system and application performance and troubleshoot problems.

The company’s flagship Cribl Stream is a vendor-agnostic observability pipeline that collects, reduces, enriches, normalizes and routes data from any source to any destination within an IT environment. The company also offers Cribl Edge for collecting log, metric and application data, and Cribl Search for searching data in place without the need to collect and store it first.

Cribl raised $150 million in Series D funding in May 2022. In April of this year the company launched a major expansion of its partner program with new MSSP and professional service specializations, deal and revenue protection, and a revamped partner portal with new self-service capabilities.

EdgeDB

CEO Yuri Selivanov

Founded: 2019

Startup EdgeDB develops a next-generation graph-relational database that the company says is designed as “the spiritual successor” to the SQL and relational paradigm.

At its core EdgeDB is a relational database with an object-oriented data model, a strict graph schema and a modern query language. The database is designed to address some of the ergonomic limitations that developers face with traditional SQL and relational schema modeling.

EdgeDB, headquartered in San Francisco, was founded in 2019 and raised $15 billion in Series A funding in November 2022.

Firebolt

CEO Eldad Farkash

Founded: 2018

Cloud data warehouse startup Firebolt focuses its services on developers and data engineers who need extreme data warehouse speed and elasticity as they build data-intensive applications.

The company is boldly challenging cloud data warehouse giants like Amazon Web Services, Google Cloud and Snowflake that provide cloud data warehouse systems for a broad range of tasks.

Firebolt, based in Tel Aviv, Israel, exited stealth in late 2020 and raised $100 million in Series C funding in January 2022.

Monte Carlo

CEO Barr Moses

Founded: 2019

How can a business guarantee the validity of data flowing through data pipelines? Monte Carlo, one of the leading companies in the data observability space, says the five pillars of observability needed to improve data reliability and eliminate data downtime are data quality, data freshness, data schema, data lineage and data volume.

The company’s Monte Carlo Data Observability platform provides a range of capabilities including machine learning-enabled data anomaly detection and alerting, data lineage problem resolution, and the ability to see data dependencies to prevent broken data.

San Francisco-based Monte Carlo has raised four rounds of funding including a $135 million Series D round in May 2022.

More recently the company unveiled the Data Reliability Dashboard and brought data observability to the data orchestration level through integration with Fivetran’s data movement and transformation platform.

Onehouse

CEO Vinoth Chandar

Founded: 2021

Touting itself as “the new bedrock for your data,” startup Onehouse is developing a foundation for an open-source, cloud-native, fully managed data lakehouse service.

The company’s service is based on Apache Hudi, an open-source transactional data lake project that brings database and data warehouse capabilities to a data lake. The goal is to serve as a data integration layer between different data repositories, according to the company.

In February Onehouse, based in Menlo Park, Calif., raised $25 million in Series A funding.

Promethium

CEO Kaycee Li

Founded: 2018

Startup Promethium says that its collaborative data virtualization platform accelerates data analytics projects by eliminating data management and analytics complexity.

Promethium touts its software as a key element for data fabric initiatives that connect an organization’s data for analytics, machine learning and other tasks without the need for traditional ETL tools and approaches.

Rivery

CEO Itaman Ben Hemo

Founded: 2018

Rivery offers a cloud-based data operations platform that provides ELT, data pipeline and data integration capabilities. The technology aggregates, transforms and models data directly inside of a cloud data warehouse.

New York-based Rivery raised $30 million in a Series B funding round in May 2022.

Starburst

CEO Justin Borgman

Founded: 2017

Starburst develops a data analytics platform, Starburst Enterprise, that can analyze huge volumes of data distributed across multiple locations – an alternative to the traditional approach of collecting and consolidating data in a central data warehouse. The company also offers Starburst Galaxy, a fully managed data lake analytics platform for handling petabyte-scale datasets.

The company’s software is built on the open-source Trino distributed SQL query engine. In June 2022 Starburst acquired Varada, an Israeli developer of data lakes acceleration technology.

In early 2022 Starburst raised $250 million in a Series D funding round that at the time put the startup’s valuation at $3.35 billion.

Syncari

CEO Nick Bonfiglio

Founded: 2019

The Syncari Data Automation Platform helps businesses manage, integrate, clean and distribute customer data throughout an enterprise. The system combines data management, workflow automation and multi-directional data synchronization.

In October San Francisco-based Syncari launched Syncari Embed, a platform of APIs that extend the Syncari functionality to application ecosystems. It provides more than 50 intelligent connectors, a custom connector SDK and a unified data model.