The Coolest Big Data Management And Integration Tool Companies Of The 2024 Big Data 100

Part 4 of CRN’s 2024 Big Data 100 includes a look at the vendors solution providers should know in the big data management and integration tools space.

Managerial Perogative

By 2025 the total amount of digital data generated, gathered, copied and consumed globally is expected to be in the range of 175 to 180 zettabytes. And more of that data is spread across distributed, hybrid-cloud and multi-cloud networks.

That creates major challenges for businesses and organizations trying to make valuable use of their data assets. They need advanced tools to identify and inventory the data they have and where it resides. They need software to collect, manage, integrate and transform data, moving it from operational systems into data warehouses and data lakes – even in real time – for analytical tasks. And they need tools to improve and maintain data quality and to govern data to ensure its usage meets privacy and security compliance requirements.

Data management and data integration software is one of the most dynamic segments of the big data universe with hundreds of vendors providing software products for specific data management tasks or more complete suites of integrated tools for performing a range of data management chores.

This week CRN is running the CRN 2024 Big Data 100 in a series of slide shows, organized by technology category, spotlighting vendors of business analytics software, database systems, data warehouse and data lake systems, data management and integration software, data observability tools, and big data systems and cloud platforms.

Here's a look at 30 companies with products in the big data management and data integration space.

Some vendors have big data product portfolios that span multiple technology categories. They appear in the Big Data 100 slideshow for the technology segment in which they are most prominent.

Airbyte

Top Executive: CEO Michel Tricot

Airbyte is one of the younger companies in the data integration/ELT (extract, transform, load) tools space with its open-source data movement/data integration platform and connectors for setting up and running data movement operations.

Last month Airbyte said more than 5,000 connectors had been developed using the company’s no-code builder (which launched in June 2023) and were in active use. The fast-growing company also said revenue had grown four-fold over the previous six months.

In October 2023 the company unveiled additional vector database connectors that are critical for connecting data sources to AI applications.

Founded in 2020 and based in San Francisco, Airbyte raised $150 million in Series B funding in December 2021.

Alation

Top Executive: CEO Satyen Sangani

The Alation Data Intelligence Platform includes a number of tools for creating a unified view of metadata across all data and AI assets, providing trustworthy data for analysts and business users, and ensuring that all privacy, risk and compliance requirements are met.

The Data Intelligence Platform includes tools for data catalog, data governance, data lineage and other tasks. The company unveiled the most recent release of the platform in March with a next-generation user interface and new data lineage capabilities for data management and analytics teams.

In June 2023 both Databricks and Snowflake named Alation, headquartered in Redwood City, Calif., their data governance partner of the year (Snowflake for the third consecutive year).

Alluxio

Top Executive: CEO Haoyuan Li

Alluxio develops data orchestration software – a virtual distributed file system – that brings data closer to compute systems to accelerate heavy-duty data analysis workloads.

The company’s flagship product, Alluxio Enterprise Data platform, resides between compute and data storage systems and allows organizations to manage data workloads across diverse infrastructure environments including on-premises, cloud, hybrid cloud and multi-cloud.

In October the Foster City, Calif.-based company launched Alluxio Enterprise AI, a new data management system specifically for AI and machine learning tasks such as deep learning and large-scale model training and deployment.

Anomalo

Top Executive: CEO Elliot Shmukler

Anomalo’s automated data quality monitoring platform provides anomaly detection, data governance, data validation and data observability to help businesses and organizations ensure data integrity. The product incorporates AI functionality for rapid detection, root cause analysis and resolution of data quality issues.

In January Anomalo, based in Palo Alto, Calif., raised $33 million in Series B funding in a round that included strategic investor Databricks.

At the same time Anomalo said that it had grown its annual recurring revenue nearly three-fold in the first three quarters of its fiscal year with adoption by Fortune 500 customers in the financial services, insurance, retail and technology industries.

Aparavi

Top Executive: CEO Adrian Knapp

Unstructured data can account for up to 80 percent of an organization’s data – much of it hidden “dark data” that can be difficult to find.

Aparavi’s data intelligence and automation platform provides data profiling, data governance and data value functionality to identify, classify and optimize unstructured data wherever it resides. That helps businesses and organizations mitigate risk, reduce infrastructure costs and storage needs, and better utilize data assets for analytics, machine learning and collaborative tasks.

Aparavi, originally founded in Switzerland, is headquartered in Santa Monica, Calif.

Astera Software

Top Executive: CEO Ibrahim Surani

The Astera Data Stack is a code-free, unified data management platform that provides tools for data integration, unstructured data management, data warehousing, EDI (electronic data interchange) management and API lifecycle management.

In March Astera, based in Westlake Village, Calif., launched Astera 10.3 with the new Astera Dataprep data preparation tool and new generative AI capabilities built into the platform’s data extraction, data integration and data warehousing tools.

Astronomer

Top Executive: CEO Andy Byron

Astronomer’s Astro unified data orchestration platform unifies data across clouds, teams and deployments, according to the company, and ensures that data is delivered to critical applications on time, securely and accurately.

Astro is built on the open-source Apache Airflow software that’s used to author, schedule and manage data workflows. Airflow was created at Airbnb in 2014 and brought into the Apache Software Foundation’s incubator program in 2016.

In February Astronomer, which recently moved its headquarters to New York, said Astro sales grew 292 percent year over year. And in March the company unveiled the latest Astro update with enhanced security and accelerated development capabilities, and new reporting dashboards for data governance operations.

Ataccama

Top Executive: CEO Mike McKee

The Ataccama ONE unified data management and governance platform offers a range of functionality including data catalog, data quality, data integration, data observability, master data management and reference data management.

The company released Ataccama ONE v15 in February with generative AI-enhanced features – including allowing users to extract information using plain text language questions rather than coding – and reviewing AI-augmented suggestions. The redesigned data lineage visualization diagram better enables data lineage analysis and faster resolution of data quality issues.

Ataccama, headquartered in Toronto, hired Mike McKee as CEO in August 2023. McKee succeeded Michal Klaus, who had led the company since its 2008 founding. McKee joined Ataccama from Dotmatics and previously worked at ObserveIT, Proofpoint, Rapid7 and PTC.

Coalesce

Top Executive: CEO Armon Petrossian

The Coalesce data transformation automation platform is specifically designed for managing large-scale data transformation workloads on the Snowflake Data Cloud platform.

Data transformation tasks are often a bottleneck for building data warehouses and performing data analytics. The Coalesce platform accelerates data transformation by helping data teams visually build, adjust and deploy data pipelines and dynamic data tables in Snowflake.

Coalesce, headquartered in San Francisco, was founded in 2020 and exited stealth in early 2022. Just this month the company raised $50 million in Series B funding, bringing its total funding to $81 million.

Collibra

Top Executive: CEO Felix Van de Maele

Collibra’s Data Intelligence Platform includes data catalog, data governance, data lineage, data privacy, and data quality management and observability functionality.

In February the company launched Collibra AI Governance, built upon the Collibra Data Intelligence Platform, which helps data, AI and legal teams collaborate to ensure that AI systems comply with legal and privacy policies, mitigate data risk, improve model performance and ROI, and accelerate time to production.

In September 2023 Collibra, which has dual headquarters in Brussels and New York, acquired Husprey, the developer of a popular integrated SQL data notebook system. Collibra has integrated the Husprey collaborative data workspace technology into its platform.

Confluent

Top Executive: CEO Jay Kreps

Confluent develops a streaming data system that enables businesses to process and manage data in continuous, real-time streams for data-intensive applications, AI tasks and data analytics.

Confluent’s software, offered as an on-premises platform and a cloud service, is based on the open-source Apache Kafka streaming data platform that was originally developed by Confluent’s founders. Confluent’s platform goes beyond the Kafka core with additional enterprise-grade features including development tools, data governance, data connectors and support services.

Confluent has been steadily expanding its presence in the channel, this year alone launching initiatives to help partners migrate customers from Kafka to Confluent and to boost service opportunities for systems integrators.

Confluent, based in Mountain View, Calif., reported revenue of $777.0 million in 2023, up 33 percent from $585.9 million one year earlier.

Datadobi

Top Executive: CEO Ian Leysen

Datadobi’s StorageMAP platform manages unstructured data by synchronizing data between multiple- and hybrid-cloud systems. The platform is based on Datadobi’s patented unstructured data mobility engine technology.

In November Datadobi, headquartered in Leuven, Belgium, debuted StorageMAP 6.6 with the ability to analyze object data stored on any S3-complaint platform, providing users with a complete view of both File (SMB and NFS) and Object (S3) data. The new release also provided richer file copy and file movement functionality.

dbt Labs

Top Executive: CEO Tristan Handy

Dbt Labs develops a data transformation framework and workflow tool used by data analysts and data engineers to effectively execute SQL data transformation tasks, and transform, test and document data in a cloud data warehouse.

Dbt is rapidly growing in popularity and has become a must-have tool within many organizations’ big data operations. The company was Snowflake’s 2023 Data Integration Partner of the Year and Databricks’ 2023 Customer Impact Partner of the Year.

In October dbt Labs debuted the next generation of the dbt Semantic Layer, based on technology from the acquisition of Transform in February 2023. The company also unveiled significant enhancements to dbt Cloud that boosts its scalability.

Denodo

Top Executive: CEO Angel Vina

Denodo develops a logical data integration, management and delivery platform with a centralized data access layer that enables users to find, query, integrate and securely share data assets – both on-premises and in the cloud.

Denodo says the platform provides all the capabilities necessary to build a logical data fabric across an organization for use in data warehouses, data lakes, data hubs and enterprise applications.

In October Denodo unveiled significant enhancements to its platform that the company said helps organizations “democratize data usage” using generative AI, enforces consistent security and cost management policies, and enables self-service for business users.

In September Denodo said that TPG Growth had invested $336 million in the company’s Series B preferred equity.

Fluree

Top Executive: CEO Brian Platz

Fluree develops its product portfolio to provide organizations with trusted, linked and composable data.

Fluree’s flagship technology, Fluree Core, is a semantic graph database that combines blockchain technology, semantic graph query capability and data-centric security policy controls. The system serves as a foundation for developing data-centric Web3 applications.

The company, based in Winston-Salem, N.C., also provides Fluree Sense, an AI-powered structured data integration and transformation pipeline system; Fluree Cam, a content auto-tagging manager for unstructured data; and the Fluree ITM taxonomy manager.

Fivetran

Top Executive: CEO George Fraser

Fivetran provides an automated data replication and ELT (extract, load and transform) platform for moving data into, out of, and across all sorts of cloud data systems for both operational and analytical tasks.

The company’s system, which also includes built-in automated data governance and security, now has more than 550 data source connectors and counting. The growing list of available connectors got a boost in November when Fivetran unveiled two software developer kits that third-party vendors use to develop data source and target destination connectors.

In May 2023 the Oakland, Calif.-based company raised $125 million in new financing from Vista Credit Partners.

Immuta

Top Executive: CEO Matthew Carroll

The Immuta Data Security Platform helps businesses and organizations quickly discover sensitive data, secure it and monitor its usage, scaling across large cloud ecosystems to simplify data security workflows.

That, according to the Boston-based company, makes it easier to derive full value from data and use it to spur innovation and growth.

Just this month Immuta added new domain policy enforcement capabilities to its platform, providing additional controls for data owners to implement a data mesh architecture with data access policies for specific domains such as business units, geographic regions or job functions.

In October the company launched Immuta Discover for automated tagging and classification on cloud data platforms, enabling data teams to establish and maintain accurate metadata for data access control purposes.

Informatica

Top Executive: CEO Amit Walia

Founded in 1993, Informatica is a long-time player in the big data space and a pioneer of data integration and ETL technology.

Today the company’s extensive product portfolio, led by the AI-powered Intelligent Data Management Cloud platform, includes data catalog, data integration, data quality and observability, master data management, data governance and privacy, and application and API integration capabilities.

In February the company launched Informatica Cloud Data Access Management, integrated with the Intelligent Data Management Cloud platform. CDAM is based on technology the company acquired when it bought Privitar last year.

Last week published reports said that Informatica, based in Redwood City, Calif., was in talks for a possible deal to be acquired by cloud application giant Salesforce for more than $11 billion. But this week Informatica issued a statement saying that “it is not currently engaged in any discussions to be acquired.”

For all of 2023 Informatica reported revenue of $1.60 billion, up 6 percent from $1.51 billion in 2022.

Matillion

Top Executive: CEO Matthew Scullion

Matillion is a major player in the data ELT space with its cloud-native Matillion Data Productivity Cloud system for automating data movement, transformation and orchestration tasks.

Last month Matillion launched a cloud data integration platform that unifies pushdown ELT and pushdown generative AI to enable data engineers to build analytics and AI data pipelines faster on platforms such as Snowflake, Databricks and AWS. (Pushdown is an alternative computing method for running data quality jobs where all processing is submitted to a SQL data warehouse.)

This month Matillion, which has dual headquarters in Manchester, U.K., and Denver, named AWS and VMware veteran Eric Benson as the company’s new chief revenue officer. Benson was vice president of Americas Cloud Sales at VMware for nearly four years and before that served for more than 10 years in multiple senior executive roles at AWS.

Nextdata

Top Executive: CEO Zhamak Dehghani

Startup Nextdata exited stealth in early 2023 and has been developing its NextdataOS “data mesh” technology since then. The company’s mission, according to its web site, “is to make the experience of creating, sharing, discovering and using data connected, fast and fair.”

Data mesh is a new approach to building a distributed data architecture that supports domain-specific data consumers (such as sales or marketing, for example), rather than traditional centralized, monolithic approaches like data warehouses or data lakes.

In September Nextdata, founded by Dehghani and based in San Francisco, raised $12 million in seed funding, led by Greycroft and Acrew Capital. That is being used to advance development of the company’s proprietary data mesh technology and “expand the hiring of critical talent across product, engineering and go-to-market teams,” the company said.

Nexla

Top Executive: CEO Saket Saurabh

With its automated data engineering platform, Nexla looks to unify and simplify multiple DataOps chores such as data integration, data transformation, data monitoring and more.

Key to the platform’s functionality are its universal bi-directional connectors, Nexsets “logical data units” or data containerization building blocks, and the Nexla data fabric architecture.

Nexla was founded in 2016 and is headquartered in San Mateo, Calif.

Octopai

Top Executive: CEO Yael Ben Arie

Under the moniker “the automated data intelligence company,” Octopai develops a platform with automated data lineage, data discovery and data catalog capabilities that helps data analytics teams to quickly find and understand their data and ensure the trustworthiness of their data assets.

A more recent addition to the platform is Octomize AI, an AI agent that provides data teams with a real-time, unified workspace that automates, optimizes and interprets SQL scripts while providing insights into data lineage.

Octopai is headquartered in Kefar Sava, Israel. In October the company gained Preferred Solution status on the Microsoft Azure Marketplace.

Precisely

Top Executive: CEO Josh Rogers

The Precisely Data Integrity Suite is a portfolio of software tools for improving data to make better decisions.

The suite includes applications for high-speed data integration, data quality, data integrity, data governance, master data management, data enrichment, location intelligence and customer engagement applications, among others.

In December Precisely, headquartered in Burlington, Mass., announced that the Precisely Data Integrity Suite was available to run on the Snowflake Data Cloud.

Reltio

Top Executive: CEO Manish Sood

The Reltio Connected Data Platform is a cloud-native data unification and management system that uses multidomain master data management, entity resolution and other technologies to cleanse and unify complex, multi-source data into a single source of real-time information for operational, analytical and AI tasks.

In February Reltio debuted the Reltio Connected Data Platform 2024.1 with improved data unification automation through “flexible entity resolution networks” technology that uses LLM-powered, pre-trained machine learning models. The new release also included the Reltio Intelligent Assistant that uses generative AI and natural language technology to search digital content.

Reltio is headquartered in Redwood Shores, Calif.

Rivery

Top Executive: CEO Itamar Ben Hemo

Rivery provides a cloud-based ELT data operations platform for building and automating complex, end-to-end data pipelines, transforming data, and integrating data across a wide range of sources using fully managed data replication and more than 200 native connectors.

Founded in 2019 Rivery has headquarters in New York and Tel Aviv. In May 2022 the company raised $30 million in a Series B funding round led by Tiger Global.

Striim

Top Executive: President and CEO Ali Kutay

Striim’s “intelligent integration platform,” offered as both an on-premises system and a fully managed cloud service, uses change data capture to unify data across clouds, applications and databases in real time. Connecting legacy systems with modern cloud applications is a key use case for the AI-powered Striim system.

Striim, based in Palo Alto, Calif., recently launched Striim Cloud for Application Integration, a fully managed, SaaS service for application connectors through which users can stream real-time CRM, ERP, billing and payment data from cloud applications to data warehouses with zero coding.

In December Striim announced that its system had been integrated with the Microsoft Fabric data analytics platform, providing data for real-time analytics and AI-driven insights.

Syncari

Top Executive: CEO Nick Bonfiglio

Syncari develops a low-code/no-code data automation platform used to synchronize, unify, clean, manage, analyze and distribute trusted customer and revenue data for sales, marketing and other go-to-market operations.

The company’s SyncAI suite, including InsightsGPT, PipelineGPT and ActionGPT, adds generative AI capabilities to the company’s data workflows, enabling revenue teams to analyze customer data using conversational queries and execute data automation with natural language prompts.

Syncari is headquartered in Newark, Calif.

Tamr

Top Executive: CEO Anthony Deighton

Tamr is focused on helping businesses develop “golden records” – clean, accurate, enriched and continuously updated data on customers, vendors, products, contacts, company spending and more.

Tamr offers AI-powered data products for B2B and B2C customers, such as providing 360 views of customers for cross-sell/upsell opportunities. Another product provides data on suppliers for optimizing sourcing strategies and minimizing supply chain risk. And the company’s Market Data Linkage product provides data needed for investment decisions.

Tamr promoted Anthony Deighton as CEO in February. He has been with the company for nearly four years, originally hired as chief product officer in June 2020. He replaces CEO Andy Palmer who remains on Tamr’s board. Palmer has been CEO since co-founding the Cambridge, Mass.-based company with database luminary Mike Stonebraker in 2013.

Vast Data

Top Executive: CEO Renen Hallak

Vast Data has its roots in the data storage system side of the IT industry with its Universal Storage software. Today the company’s Vast Data Platform stores, catalogs, enriches and secures structured and unstructured data for data-intensive computing, deep learning and AI workloads.

The platform is built on a distributed systems architecture called DASE (Disaggregated and Shared-Everything). The platform includes VAST DataStore for unstructured data management, VAST DataSpace for edge-to-cloud data access, VAST DataBase for structured data management, and (available this year) the VAST DataEngine for providing actionable data insights.

In December Vast Data raised $118 million in Series E funding, boosting the New York-based company’s valuation to $9.1 billion.

WekaIO

Top Executive: CEO Liran Zvibel

WekaIO is another of several companies on this year’s Big Data 100 list that span the data storage and data management sectors of the IT industry. The company’s hybrid cloud Weka Data Platform, built on a distributed file system and object storage technology, is positioned for data-intensive workloads in AI, machine learning, high-performance computing and more.

In March WekaIO unveiled WEKApod, a new data appliance based on Nvidia DGX H100 processors that’s certified for Nvidia DGX SuperPOD deployments. The system combines WekaIO’s AI-native data platform software with storage hardware to provide a purpose-built environment for getting AI projects into production more quickly.