The Coolest Stellar Startups Of The 2024 Big Data 100

Part 7 of CRN’s Big Data 100 takes a look at the startup companies solution providers should know in the big data arena.

Big Data, Big Plans

The majority of the companies on the CRN 2024 Big Data 100 are either major IT vendors like Amazon Web Services, Microsoft and Oracle, or younger – albeit well-established – companies like Snowflake, Databricks, Qlik and Informatica.

But startup companies are often a source of the most innovative technologies. Unencumbered by legacy products, startups can move quickly to develop new, leading-edge products when they identify a specific need in big data management, processing or analytics.

This week CRN is running the 2024 Big Data 100 list in slide shows, organized by technology category.

As part of the Big Data 100, CRN has compiled a list of 16 startup big data technology companies founded between 2018 and the present. This list cuts across most technology categories of this year’s Big Data 100 and includes startups developing new technology in business analytics, data warehouse platforms, database systems, data management and integration tools, and data observability and DataOps software.

Some vendors have big data product portfolios that span multiple technology categories. They appear in the Big Data 100 slideshow for the technology segment in which they are most prominent.


Top Executive: CEO Rohit Choudhary

Acceldata says its all-in-one observability platform provides insights into data stacks to improve data quality, pipeline reliability, platform performance and spending efficiency across enterprise data systems in the cloud, on premises and in hybrid systems.

In February Acceldata boosted its data observability offerings with the launch of new AI technology that uses advanced machine learning algorithms to analyze huge volumes of data for automated anomaly detection, root cause analysis and predictive analytics.

That followed Acceldata’s September acquisition of AI engine developer Bewgle, a move Acceldata said would allow it to expand its observability capabilities into AI and large language models.

Campbell, Calif.-based Acceldata was founded in 2018.


Top Executive: CEO Michel Tricot

Airbyte is one of the younger companies in the data integration/ELT (extract, transform, load) tools space with its open-source data movement/data integration platform and connectors for setting up and running data movement operations.

Last month Airbyte said more than 5,000 connectors had been developed using the company’s no-code builder (which launched in June 2023) and were in active use. The fast-growing company also said revenue had grown four-fold over the previous six months.

In October 2023 the company unveiled additional vector database connectors that are critical for connecting data sources to AI applications.

Founded in 2020 and based in San Francisco, Airbyte raised $150 million in Series B funding in December 2021.


Top Executive: CEO Elliot Shmukler

Anomalo’s automated data quality monitoring platform provides anomaly detection, data governance, data validation and data observability to help businesses and organizations ensure data integrity. The product incorporates AI functionality for rapid detection, root cause analysis and resolution of data quality issues.

In January Anomalo, founded in 2018 and based in Palo Alto, Calif., raised $33 million in Series B funding in a round that included strategic investor Databricks.

At the same time Anomalo said it had grown its annual recurring revenue nearly three-fold in the first three quarters of its fiscal year with adoption by Fortune 500 customers in the financial services, insurance, retail and technology industries.


Top Executive: CEO Andy Byron

Astronomer’s Astro unified data orchestration platform unifies data across clouds, teams and deployments, according to the company, and ensures that data is delivered to critical applications on time, securely and accurately.

Astro is built on the open-source Apache Airflow software that’s used to author, schedule and manage data workflows. Airflow was created at Airbnb in 2014 and brought into the Apache Software Foundation’s incubator program in 2016.

In February Astronomer, which was founded in 2018 and recently moved its headquarters to New York, said Astro sales grew 292 percent year over year. And in March the company unveiled the latest Astro update with enhanced security and accelerated development capabilities, and new reporting dashboards for data governance operations.


Top Executive: CEO Kyle Kirwan

Bigeye’s data observability and data monitoring platform provides enterprise-grade data observability, including AI-driven anomaly detection and comprehensive data lineage, for both modern and legacy data stacks.

Bigeye was founded in 2019 by CEO Kyle Kirwan and CTO Egor Gryaznov who both worked at Uber on the data pipelines for the company’s in-house A/B testing tool.

In December Bigeye received a strategic investment from data analytics platform company Alteryx. The amount was undisclosed, but Bigeye said it brought its total funding to $68.5 million.


Top Executive: CEO James Li

CelerData offers a high-performance data lakehouse analytics system, based on the StarRocks SQL query engine, through its on-premises CelerData Enterprise software and CelerData Cloud managed service.

CelerData’s founders developed the StarRocks technology in 2020 and started the company that year, originally with the StarRocks name. But the company changed its name to CelerData in late 2022 and in 2023 contributed the StarRocks technology to the Linux Foundation where it resides as an open-source project.


Top Executive: CEO Yury Selivanov

EdgeDB describes its offering as “an open-source database designed as a spiritual successor to SQL and the relational paradigm.”

The database is powered by the Postgres query engine with a data schema model that the company calls “graph-relational” and a query language, EdgeQL, that “blends the best” of GraphQL and SQL.

EdgeDB was founded in 2019 and EdgeDB 1.0 launched in February 2022. The company raised $15 million in a Series A funding round in November 2022.

In November 2023 the San Francisco-based company debuted EdgeDB 4.0 and a cloud edition of the database.

Hex Technologies

Top Executive: CEO Barry MCardel

Startup Hex has been getting a lot of attention with the Hex platform, a modern data workspace system for collaborative analytics and data science tasks.

The company’s software includes AI-powered tools, collaborative data notebooks, tools for building applications with data visualizations, and data integration technology – all making it possible to connect and analyze data and share work using interactive data applications and stories.

Hex was founded in 2019 by McCardel, CTO Caitlin Colgrove and Chief Architect Glen Takahashi who previously worked together at Palantir. The company raised $52 million in Series B funding in March 2022.

In October Hex launched Hex 3.0 with new AI capabilities, a new compute engine, a new metadata engine and the App Builder tool for turning insights into interactive experiences. Earlier in the year the company debuted Hex Magic tools that bring the power of large language models directly into the Hex workspace.

Monte Carlo

Top Executive: CEO Barr Moses

Monte Carlo’s Data Observability Platform is an end-to-end system for monitoring data stacks and providing alerts for data issues across data warehouses, data lakes, ETL (extract, transform and load) systems and business analytics tools.

The system automatically and immediately identifies the root cause of data problems using machine learning-based incident monitoring and resolution capabilities.

Monte Carlo and Fivetran, developer of a cloud-based automated data movement platform, recently collaborated to integrate their software, allowing organizations that use both products to better monitor data quality at time of ingestion.

Monte Carlo, founded in 2019 and based in San Francisco, raised $135 million in Series D funding in May 2022.


Top Executive: CEO Jordan Tigani

Startup MotherDuck launched the first release of its serverless MotherDuck Cloud Analytics Platform in June 2023, combining cloud and embedded database technology to make it easy to analyze data no matter where it resides.

MotherDuck’s software is based on the company’s DuckDB open-source, embeddable database. The cloud system simplifies the analysis of data of any size by combining the speed of an in-process database with the scalability of the cloud, according to the company.

MotherDuck makes the argument that most advances in data analysis in recent years have been geared toward large businesses and organizations with more than a petabyte of data while neglecting small and mid-size companies with like-sized data volumes.

MotherDuck, based in Seattle, was co-founded in 2022 by Google BigQuery founding engineer Jordan Tigani who today is the company’s CEO. In September the company raised $52.5 million in Series B funding, boosting its total financing to $100 million.


Top Executive: CEO Vinoth Chandar

Onehouse develops the Universal Data Lakehouse, a fully managed cloud data lakehouse service that can ingest data from all of a customer’s data sources in minutes and supports all query engines.

The service is built on the Apachi Hudi open-source data management framework that brings database and data warehouse capabilities to data lakes.

Onehouse, headquartered in Menlo Park, calif., was founded in 2021 and emerged from stealth in 2022. The company raised $25 million in Series A funding in February 2023.


Top Executive: CEO Kaycee Lai

Promethium describes its software as the industry’s “first AI-native data fabric platform” that provides a single, unified, consistent view of – and access to – all data from across multiple sources.

Earlier this month the company shipped Promethium Revision18, a new release with new features and enhancements that streamline workflows, improve data governance and provide deeper insights for data engineers and chief data officers.

Promethium was founded in 2018 and is headquartered in Menlo Park, Calif.


Top Executive: CEO Itamar Ben Hemo

Rivery provides a cloud-based ELT data operations platform for building and automating complex, end-to-end data pipelines, transforming data, and integrating data across a wide range of sources using fully managed data replication and more than 200 native connectors.

Founded in 2019, Rivery has headquarters in New York and Tel Aviv. In May 2022 the company raised $30 million in a Series B funding round led by Tiger Global.


Top Executive: CEO Nick Bonfiglio

Syncari develops a low-code/no-code data automation platform used to synchronize, unify, clean, manage, analyze and distribute trusted customer and revenue data for sales, marketing and other go-to-market operations.

The company’s SyncAI suite, including InsightsGPT, PipelineGPT and ActionGPT, adds generative AI capabilities to the company’s data workflows, enabling revenue teams to analyze customer data using conversational queries and execute data automation with natural language prompts.

Syncari was founded in 2019 and is headquartered in Newark, Calif.


Top Executive: CEO Mona Rakibe

Telmai, founded in 2020 and headquartered in San Francisco, is one of the more recent startups in the data observability arena. Telmai’s AI-driven data observability platform helps data teams automate the process of monitoring data pipelines, using a range of data quality metrics and KPIs, and proactively detect and investigate data anomalies in real time.

Telmai released a new edition of its software in September 2023 with a number of features designed to simplify and accelerate data observability adoption. New functionality included “time travel” retrospective analysis of historical data, private cloud options across the three major public clouds, and end-to-end observability for heterogeneous data pipelines.

The company raised $5.5 million in seed funding in June 2023.


Top Executive: CEO Bala Kuchibhotla

Tessell’s database-as-a-service platform is used to set up, manage, secure and scale relational databases in the cloud including Oracle Database, Microsoft SQL Server, MySQL, PostgreSQL, Milvus and MongoDB.

Tessell says its platform, running on AWS and Microsoft Azure, can accelerate the migration of database workloads to cloud platforms and provide a unified control plane for managing databases across multiple cloud systems.

In January the company unveiled Tessell Database Lifecycle Management for Exadata@Azure, the latter the Oracle-Microsoft collaboration that makes the Oracle Database and Exadata services available directly within the Azure platform.

Tessell was founded in 2021 and is based in San Ramon, Calif.