The 10 Hottest Big Data Startups Of 2023

Here’s a look at the 10 hottest big data startups of 2023 including Airbyte, Hex, MotherDuck and Starburst.

Big Data, Big Ambitions

Data has become a valuable asset for many businesses and organizations. They are analyzing it to gain insights about markets, customers and their own operations. They are using data to fuel digital transformation initiatives and support new data-intensive services. And data – lots of it – is also a critical component of AI and machine learning initiatives.

But wrangling, managing and analyzing data is a major challenge today. The total amount of data created, captured, replicated and consumed is growing at more than 20 percent a year and is forecast to reach approximately 291 zettabytes in 2027, according to market researcher IDC.

That’s why there is a steady stream of big data startup companies developing leading edge technologies to help businesses access, collect, manage, move, transform, analyze, understand, measure, govern, maintain and secure data.

Here's a look at 10 big data startups that caught our attention in 2023 that we think solution providers should be aware of.


Top Executive: Co-Founder and CEO Michel Tricot

Moving data from operational applications and databases into data warehouses, data lakes and other analytical systems is one of the most challenging steps in data analytics.

There are a lot of commercial data movement and integration tools on the market, but Airbyte continues to attract attention with its open-source data movement/data integration engine and connectors for setting up and running data movement operations.

In September Airbyte said that in just three months its user community had built more than 1,500 data connectors using the no-code connector builder the company debuted in June. And in October the company announced additional vector database connectors critical for connecting data sources to AI applications.

Founded in 2020 and based in San Francisco, Airbyte raised $150 million in Series B funding in December 2021.


Top Executive: CEO Andy Byron

Astronomer develops the Astro unified data orchestration platform for centralizing visibility and control over data flows and streamlining data pipeline deployment. The system helps businesses and organizations scale up for large-scale data integration, data analytics, and AI and machine learning tasks, and meet the data demands for critical financial service, retail and ecommerce applications.

Astro is based on the open-source Apache Airflow workflow management technology – originally developed at Airbnb – for data engineering pipelines.

On December 6 Astronomer launched the latest release of Astro with simplified connection management capabilities, new system upgrade utilities and a new system deployment feature to reduce operational costs.

Astronomer, with headquarters in Cincinnati and San Francisco, was founded in 2018 and raised $213 million in Series C funding in March 2022. The company implemented staff layoffs in early 2023, but in September reported that revenue grew 206 percent year-over-year in the first half of the year.


Top Executive: Co-Founder and CEO Barry McCardel

The big data industry is packed with companies that have developed sophisticated technologies for managing, integrating, transforming, analyzing and visualizing data. But sharing and publishing the results of analytical tasks remains a challenge.

Hex Technologies develops the Hex platform, a modern data workspace system for collaborative analytics and data science tasks. The platform includes AI-powered tools, collaborative data notebooks, tools for building applications with data visualizations, and data integration technology – all making it possible to connect and analyze data and share work using interactive data applications and stories.

Hex, based in San Francisco, was founded in 2019 by McCardel, CTO Caitlin Colgrove and Chief Architect Glen Takahashi who previously worked together at Palantir. The company raised $52 million in Series B funding in March 2022.

In October, Hex launched Hex 3.0 with new AI capabilities, a new compute engine, a new metadata engine and the App Builder tool for turning insights into interactive experiences. Earlier in the year the company debuted Hex Magic tools that bring the power of large language models directly into the Hex workspace.


Top Executive: Co-Founder and CEO Khawaja Shams

Momento emerged from stealth in November 2022 with its Momento Serverless Cache offering that optimizes and accelerates any database running on Amazon Web Services or the Google Cloud Platform.

A cache accelerates database response time by delivering commonly or frequently used data faster. But Momento’s founders argue that today’s caching technology wasn’t designed for today’s modern cloud stack. The highly available Momento cache technology can serve millions of transactions per second, according to the company, and operates as a backend-as-a-service platform, meaning there is no infrastructure to manage.

Momento, headquartered in Seattle, was co-founded by CEO Khawaja Shams and CTO Daniela Miao who previously worked at AWS and were the engineering leadership behind AWS DynamoDB, Amazon’s proprietary NoSQL database service.


Top Executive: Co-Founder and CEO Jordan Tigani

On June 22, startup MotherDuck launched the first release of its serverless MotherDuck Cloud Analytics Platform that combines cloud and embedded database technology to make it easy to analyze data no matter where it resides.

MotherDuck is based on the company’s DuckDB open-source, embeddable database. The cloud system makes it easy to analyze data of any size by combining the speed of an in-process database with the scalability of the cloud, according to the company.

MotherDuck makes the argument that most advances in data analysis in recent years have been geared toward large businesses and organizations with more than a petabyte of data while neglecting small and mid-size companies with like-sized data volumes.

MotherDuck, based in Seattle, was co-founded in 2022 by Google BigQuery founding engineer Jordan Tigani who today is the company’s CEO. In September the company raised $52.5 million in Series B funding, boosting its total financing to $100 million.


Top Executive: Founder and CEO Vinoth Chandar

Touting itself as “the new bedrock for your data,” startup Onehouse has developed a foundation for a cloud-native, fully managed data lakehouse service.

The company’s service is based on Apache Hudi, an open-source transactional data lake project that brings database and data warehouse capabilities to a data lake. The goal is to serve as a data integration layer between different data repositories, according to the company.

Founded in 2021, Onehouse, headquartered in Menlo Park, Calif., emerged from stealth in early 2022.

In February of this year the startup raised $25 million in Series A funding. It also unveiled its new Onetable technology that lets users take advantage of the benefits of a Hudi-based data lakehouse while fully leveraging the native performance accelerations in Databricks and Snowflake.


Top Executive: Co-Founder and CEO Justin Borgman

Data lake analytics platform developer Starburst is among the more established startups in the big data space having been founded in 2017. But it continues to gain momentum with its offerings based on the company’s core MPP SQL query engine (built on the Trino open-source technology) that makes it possible to query large datasets distributed across multiple data sources.

The company’s product portfolio includes the Starburst Enterprise platform and Starburst Galaxy fully managed cloud service. In September the company expanded both with new cloud migration capabilities including on-premises connectivity in Starburst Galaxy. That was followed up in November with new functionality for building interactive applications on Starburst data lakes including streaming ingestion for near-real-time analytics and automated data governance.

Boston-based Starburst raised $250 million in Series D funding in February 2022, putting its total funding at $414 million and boosting its valuation at the time to $3.35 billion.


Top Executive: Co-Founder and CEO Mona Rakibe

Data observability is one of the most active segments of the big data space with a number of startups that have launched in the last five years offering technology for monitoring data flows to improve data quality and reliability.

Telmai, founded in 2020 and headquartered in San Francisco, is one of the more recent startups. Telmai’s AI-driven data observability platform helps data teams automate the process of monitoring data pipelines, using a range of data quality metrics and KPIs, and proactively detect and investigate data anomalies in real time.

Telmai released a new edition of its software in September with a number of new features designed to simplify and accelerate data observability adoption including “time travel” retrospective analysis of historical data, private cloud options across the three major public clouds, and end-to-end observability for heterogeneous data pipelines.

The company raised $5.5 million in seed funding in June.


Top Executive: Co-Founder and CEO Bala Kuchibhotla

Tessell takes a different approach from traditional cloud databases. Rather than incorporate its own underlying proprietary database engine, Tessell’s cloud-native, managed Database-as-a-Service supports Oracle, Microsoft SQL Server, Postgres and MySQL databases.

With the unique design of its data infrastructure and management platform (running on Azure or AWS cloud platforms) Tessell says it can run heavy-duty transactional database workloads at higher performance and at lower costs.

Tessell, headquartered in San Ramon, Calif., was founded in 2021 by CEO Bala Kuchibhotla and VP/engineering head Kamal Khanuja, both previously with Nutanix and Oracle. The company raised $34 million in Series A funding from Lightspeed Venture Partners in November 2022.


Top Executive: Co-Founder and CEO Tim Wagner

Vendia develops a data collaboration platform based on blockchain technology that helps organizations overcome “data sprawl” by automating real-time data sharing and workflows across companies, clouds, systems and business networks.

Vendia (the company’s name comes from “Venn Diagram” that displays overlapping datasets) was founded in 2020 and is based in San Francisco. The company raised $30 million in Series B funding in May 2022, bringing its total financing to $50 million.