The 10 Hottest Big Data Startups Of 2021 (So Far)

Businesses are looking to next-generation databases, data management tools and big data analytics software to help them leverage huge volumes to gain a competitive edge. Here’s a look at 10 hot startups developing leading-edge technologies that help solution providers and customers meet their big data challenges.

Startups Offer Next-Generation Tools For Big Data Management And Analytics

Businesses and organizations are overwhelmed with big data and struggling to effectively manage data that’s growing in volume, expanding in variety and accelerating in speed – never mind efforts to organize and analyze all that data to gain valuable insights that can lead to competitive advantages.

Here’s a look at 10 big data startups with ground-breaking technologies that have caught our attention – so far – in 2021. The list includes companies developing leading-edge products in data operations, data management and automation, data quality, data transformation and integration, data analytics, and databases and data warehouses.

See the latest entry: The 10 Hottest Big Data Startups of 2022

For more of the biggest startups, products and news stories of 2021 so far, click here.


Top Executive: Michel Tricot, Co-Founder and CEO

Headquarters: San Francisco

Airbyte has developed an open-source data integration/ELT (extract, load and transform) engine that businesses and organizations use to quickly build data pipelines, using both provided and custom connectors, that replicate data between databases, data warehouses and data lakes.

The company is challenging established data management tech vendors like Informatica and Talend as well as younger ELT vendors including Fivetran and Matillion. Airbyte currently offers a free community edition of its software and is developing commercial cloud and enterprise editions with extended capabilities.

Launched last year, Airbyte in May closed a $26 million Series A funding round led by Benchmark. That came on the heels of a $5.2 million seed round of funding in March.


Top Executive: Kyle Kirwan, Co-Founder and CEO

Headquarters: San Francisco

Delayed, missing, duplicated and damaged data can hinder big data projects and digital transformation initiatives. Bigeye offers a data quality engineering platform that helps data management teams identify and fix data quality problems.

The platform automates data quality management tasks by instrumenting data sets and data pipelines, applying metrics to monitor and measure data quality, detecting data anomalies and alerting data managers when issues occur.

Founded in 2019, Bigeye raised $17 million in Series A funding in April, financial resources the company plans to devote to accelerate its product development and expand its go-to-market efforts.


Top Executive: Bentley Wilson, Co-Founder and CEO

Headquarters: Lehi, Utah

Cardagraph, founded in 2019, officially launched in March with its business productivity analytics software following two years of development and beta customer use.

The SaaS-based Cardagraph Platform is designed to provide business data and analytical insights to business managers, particularly those in operations, financial and marketing roles, with the intention of replacing legacy business reporting systems that no longer make the cut.

The Cardagraph software connects to systems such as Salesforce, Slack, Google, HubSpot, Workfront, Jira and others, then applies proprietary algorithms, AI and machine learning to the collected data to provide managers with information about what the company calls “areas of focus, opportunity and improvement.”

At the launch in March Cardagraph also announced what it called a “prominent round of funding” from a number of individual investors including John Pestana, ObservePoint CEO and Omniture co-founder, and John Richards, founder and CEO of Startup Ignition. The amount of the funding was not disclosed.


Top Executive: Gary Hagmueller, CEO

Headquarters: Palo Alto, Calif.

Dgraph develops a native “GraphQL” graph database, a new generation of database technologies that are challenging the predominance of traditional relational database systems. Graph databases not only store data, but they store information about the relationships among the data using graph architectures to represent the data for semantic queries.

While graph databases are especially effective for social media and other tasks where data relationships are critical, Dgraph promotes its database for effectively turning siloed data across complex data structures into real-time intelligence. Last month the company said it was the No. 1 graph database on GitHub.

In April Dgraph hired Gary Hagmueller, previously CEO with Clara Analytics, a provider of AI technology to the commercial insurance industry, as Dgraph’s new CEO. Company founder Manish Jain moved to the role of chief technology officer.


Top Executive: Eldad Farkash, Co-Founder and CEO

Headquarters: Tel Aviv, Israel

Firebolt develops a cloud data warehouse with which the startup is boldly competing against such giants as Snowflake and AWS Redshift (while running on AWS, no less) with what the startup describes as the speed-at-scale, ease-of-use and more affordable operating model of its technology.

Firebolt’s system was designed to decouple storage and compute, which the company says allows for granular elasticity and scalability in a shared-nothing architecture – while relying on S3 shared storage. System performance also gets a boost from the ability to query semi-structured data using standard SQL, without complicated ETL (extract, transform and load) practices, and faster data updates with the Firebolt File Format.

Firebolt was founded in 2019 by Sisense veterans Eldad Farkash and Saar Bitner.


Top Executive: Fangjin Yang, Co-Founder and CEO

Headquarters: Burlingame, Calif.

Imply, which calls itself the pioneer of “analytics-in motion,” develops a multi-cloud, real-time big data analytics platform that provides self-service analytics capabilities. The platform, for building analytics-driven applications, is powered by the open-source Apache Druid real-time analytics database that was developed by Imply’s founders.

This week Imply, founded in 2015, closed a $70 million Series C round of financing led by Bessemer Venture Partners, bringing its total financing to more than $116 million.


Top Executive: Itamar Ben Hemo, Co-Founder and CEO

Headquarters: New York

Rivery is gaining attention in the expanding area of DataOps or data operations management.

The startup offers what it calls an “intuitive” data integration and preparation platform that simplifies the process of aggregating and transforming both internal and external data into a single stream for loading into cloud-based analytics systems such as Amazon Redshift, Google BigQuery and Snowflake.

The company’s platform includes a no-code ETL (extract, transform, load) tool, software for automatically migrating data from on-premises systems to cloud data warehouses, and data orchestration tools to connect and orchestrate all in-house and third-party data sources.

Founded in 2018, Rivery raised $16 million in a Series A round of funding led by Entrée Capital and State Of Mind Ventures. The company received $5 million in a seed round of funding in November 2019.


Top Executive: Nick Bonfiglio, Co-Founder and CEO

Headquarters: San Francisco

Syncari’s no-code data automation platform helps data professionals unify, clean, manage and distribute trusted customer data across an enterprise. The system relies on a range of data synchronization, unification, governance and access capabilities to perform its tasks.

Last week the company announced the addition of sophisticated workflow capabilities to help sales and marketing teams make more effective use of customer data.

Syncari was founded in June 2019 by former executives from Marketo, Mulesoft and Zendesk. In May, the company announced a $17.3 million Series A round of funding.


Top Executive: Eran Vanounou, CEO

Headquarters: Tel Aviv, Israel

Varada develops data lake query acceleration software that’s designed to help businesses and organizations get more value out of their data lakes – huge stores of unorganized data. The key is the company’s autonomous indexing technology that leverages machine learning capabilities to dynamically accelerate queries.

Varada was founded in 2017 and the company’s Varada Data Platform became generally available in December 2020.

In May, Varada unveiled the addition of interactive text analytics to its system, integrated with the open-source Apache Lucerne search engine. The company said the new functionality works directly with data lakes to assist consumers of SQL data. The company also recently enhanced the platform to help cybersecurity teams analyze data lake stores for threat detection purposes.


Top Executive: Bill Cook, CEO

Headquarters: Sunnyvale, Calif.

Yugabyte develops YugabyteDB, a next generation, distributed relational database designed to handle huge amounts of data spanning multiple geographic regions and availability zones. The database supports global, business-critical applications – such as in cybersecurity and financial services – that require low query latency and extreme resilience against failures.

Yugabyte’s founders, including President Kannan Muthukkaruppan, CTO Karthik Ranganathan and Software Architect Mikhail Bautin started Yugabyte in 2016 after developing business-critical database technology at Oracle and Facebook.

In March, Yugabyte raised $48 million as part of a Series B round of funding.