The Coolest Data Management And Integration Software Companies Of The 2023 Big Data 100
Part 4 of CRN’s 2023 Big Data 100 takes a look at the vendors solution providers should know in the data management and integration software space.
Just about every business today is a data company. Whether using data to manage day-to-day operations, analyzing data for planning purposes and gaining competitive advantage, or even providing data as part of product and service offerings, data today has become a valuable asset for most businesses.
But with growing data volumes, increased data complexity and hybrid IT environments that maintain critical data across multiple systems and locations, the job of managing, maintaining, protecting and analyzing all that data has become a major challenge.
To make productive use of – and derive value from – the ever-growing volumes of data, businesses and organizations need leading-edge technologies to identify and unify siloed data, manage and control data pipelines, prepare and transform data for analysis and machine learning tasks, improve and maintain data quality, and govern and secure data against misuse.
As part of the CRN 2023 Big Data 100, CRN has compiled a list of big data management and integration software vendors that solution providers should be familiar with. They include established vendors such as Confluent, Informatica and Matillion as well as more recent startups like Airbyte, Coalesce and NextData.
This week CRN is running the Big Data 100 list in a series of slide shows, organized by technology category, spotlighting vendors of business analytics software, database systems, data warehouse and data lake systems, data management and integration software, data observability tools, and big data systems and cloud platforms.
Some vendors market big data products that span multiple technology categories. They appear in the slideshow for the technology segment in which they are most prominent.
CEO: Michel Tricot
Startup Airbyte is one of the more recent entries in the data integration and transformation arena, developing a data integration engine and ETL (extract, transform and load) platform for replicating data between applications, databases, data warehouses and other systems.
Airbyte provides an open-source edition of its software along with the commercial Airbyte Enterprise and Airbyte Cloud editions. The San Francisco-based company also offers out-of-the-box connectors to hundreds of data sources and targets, a connector development kit, and several sales and marketing analytics tools.
CEO: Satyen Sangani
Alation is a pioneer in data catalog and data governance – what the company collectively calls “data intelligence” – critical technology for creating an inventory of data assets across an organization and ensuring those assets are properly managed and used.
In January Alation, headquartered in Redwood City, Calif., launched the Partner Program for Open Connector Framework, part of the company’s Alation Partner Network channel program, which provides partners with assistance and incentives to develop links between Alation’s platform and third-party data sources.
In February the company debuted Alation Marketplaces, an online service where customers and channel partners can find third-party data sets to augment and enrich data in the Alation Data Catalog for data analytics and other data-related tasks.
CEO: Haoyuan Li
Alluxio developed the Alluxio Data Orchestration Platform, a virtual distributed file system that connects big data workloads, including analytical and AI/ML applications, with data sources across clusters, regions, clouds and countries, eliminating the need to copy data to a single data lake or data warehouse.
The Alluxio technology grew out of the Tachyon open-source project at the U.C. Berkeley AMPLab. Alluxio, based in San Mateo, Calif., offers both open-source and commercial editions of its software.
CEO: Elliot Shmukler
Anomalo develops a next-generation data quality management platform designed to help businesses and organizations catch potential data quality issues before they create problems in operational processes, data analysis tasks or business models.
The Anomalo technology monitors enterprise data, automatically detects data issues and determines their root causes. That allows data management teams to resolve data quality issues before data is put into production. The platform makes use of machine learning to rapidly assess a wide range of data sets with minimal human input.
Anomalo was founded in 2018 by CEO Elliot Shmukler and CTO Jeremy Stanley who worked together on data quality at Instacart. Anomalo, based in Palo Alto, Calif., formally launched in October 2021 with $33 million in financing.
CEO: Adrian Knapp
Unstructured data can account for up to 80 percent of an organization’s data – much of it hidden “dark data.” Aparavi has been gaining attention with its Data Intelligence and Automation Platform that identifies, classifies and optimizes unstructured data wherever it resides, helping businesses mitigate risk, reduce infrastructure costs and storage needs, and better utilize data assets for analytics, machine learning and collaborative tasks.
Aparavi, headquartered in Santa Monica, Calif., struck a partnership with distributor TD Synnex in July 2022 to offer the Aparavi product to the distributor’s customers.
CEO: Ibrahim Surani
Astera develops Astera Data Stack, a code-free, unified data management platform that provides data integration, transformation and management capabilities along with tools for data warehousing, EDI (electronic data interchange) management and API lifecycle management.
Based in Westlake Village, Calif., Astera works with reseller and system integrator partners.
CEO: Michael Klaus
The Ataccama One unified data management and governance platform offers a range of functionality including data catalog, data quality, master data management, data integration, data stories and reference data.
In February Ataccama, headquartered in Toronto, said it had deepened integration between its platform and Snowflake to provide data observability functionality and pushdown processing for joint customers. In March Ataccama said its software was available in the Microsoft Azure Marketplace for use on Azure.
Ataccama received $150 million in growth funding from Bain Capital Tech Opportunities in June 2022.
CEO: Armon Petrossian
Coalesce develops a next-generation data transformation automation platform built for large-scale data transformation workloads.
Data transformation tasks are often a bottleneck for data analytics and other data-intensive tasks. The coalesce platform is designed to support huge data warehouses, such as those running on the Snowflake Data Cloud, with its automated data transformation capabilities, flexible code and intuitive user interface.
Coalesce, headquartered in San Francisco, was founded in 2020 and exited stealth in early 2022 year with $5.92 million in seed funding, followed in September by a $26 million Series A funding round.
CEO: Felix Van de Maele
Collibra’s Data Intelligence Cloud includes data catalog, data governance, data lineage, data privacy, and data quality management and observability functionality. In November Collibra enhanced the platform with new search, collaboration, business process automation and analytics capabilities.
Collibra raised $250 million in a Series G funding round in late 2021 and last year the New York-based company announced an additional investment from Snowflake Ventures, the venture arm of data cloud giant Snowflake.
CEO: Jay Kreps
Confluent calls itself the “data in motion” company with its Confluent Platform software and Confluent Cloud streaming data pipeline system for managing real-time data processing and data flows between applications and data sources.
Other products in the Confluent portfolio include Stream Designer, Stream Governance and ksqlDB.
Confluent, based in Mountain View, Calif., was launched in 2014 to commercialize the Apache Kafka open-source event streaming platform that Confluent’s founders created while working at LinkedIn. Confluent went public in June 2021 and in 2022 the company reported revenue of $585.9 million, up 51 percent from 2021.
In January, Confluent struck a deal to acquire Immerok, which developed a fully managed cloud service based on Apache Flink, the popular open-source technology for building stream processing applications.
CEO: Tristan Handy
Dbt Labs has been gaining momentum with its cloud-based data transformation workflow tools that help businesses and organizations build data pipelines and transform, test and document data within cloud data warehouse systems.
Dbt, based in Philadelphia, sees its technology playing a central role in the cloud data analytics “stack.” The Dbt platform includes a development framework that combines SQL development and software engineering best practices such as modularity, portability, CI/CD and documentation.
In February 2022 the company raised $222 million in Series D financing with data cloud giant Snowflake and data lakehouse developer Databricks among the investors.
Last year a number of players in the big data space established alliances with Dbt and/or linked their products with the Dbt platform and tools, including Databricks and cloud data analytics provider Starburst.
Dbt launched a technology partner program in August 2022.
CEO: Angel Vina
Denodo develops a data virtualization platform that through its data integration, management and delivery capabilities makes it possible to connect data scattered across disparate systems, including data warehouses and data lakes.
The Palo Alto, Calif.-based company markets its technology in several iterations including the Denodo Platform, Denodo Platform for Mid-Market, Denodo for Cloud and the free Denodo Express.
CEO: Guy Eilon
Equalum offers its namesake Continuous Data Integration Platform, which uses enterprise-grade change data capture technology to support data replication, data ETL and real-time data streaming use cases in cloud and hybrid environments.
In November Equalum launched its CDC Connect OEM program through which technology partners can build the Equalum change data capture tool into their platform or workflow.
In August the Sunnyvale, Calif.-based company raised $14 million in Series C funding.
CEO: Brian Platz
Fluree’s flagship technology, Fluree Core, is a semantic graph database that combines blockchain technology, semantic graph query capability and data-centric security policy controls. The system, either on-premises or the hosted Fluree Nexus platform-as-a-service, serves as a foundation for developing data-centric, Web3 applications.
The company also provides Fluree Sense, an AI-powered data integration and transformation pipeline system.
Earlier this month Fluree, based in Winston-Salem, N.C., raised $10 million in a Series A funding round.
CEO: George Fraser
Fivetran’s data transformation and movement platform automates the process of moving data into, out of and across cloud platforms, including databases and data warehouses, according to the company. One of the company’s strengths is the number of fully managed pre-built connectors it provides to speed up data integration initiatives.
In March the company expanded the capabilities of the Fivetran Data Movement Platform with new high-volume change data capture replication capabilities.
In February Fivetran, based in Oakland, Calif., said it had achieved a $200 million annual revenue run rate. Last September Fivetran launched an enhanced partner program to boost reseller sales and international expansion.
CEO: Matthew Carroll
As the number of data sources maintained by businesses and organizations proliferate across the cloud and the number of people who access that data grows, making sure that the right people are accessing the right data – and enforcing data use rules and regulations – becomes a huge challenge.
Immuta’s Data Security Platform discovers sensitive data, secures data access and monitors data usage, helping to eliminate insider threats by ensuring that only the right people have data access and providing data management and security teams with the ability to manage and ensure compliance with data usage policies.
In January the Boston-based company debuted Immuta Detect, the latest addition to the Data Security Platform, providing continuous data security monitoring capabilities that alert data and security teams about risky behavior, enabling faster and more accurate risk remediation, and improving data security management across cloud data platforms.
CEO: Amit Walia
Informatica is a long-time player in the big data space and a pioneer of data integration and ETL technology. Today the company’s broad product portfolio, integrated into the company’s Intelligent Data Management Cloud platform, includes data catalog, data integration, data quality and observability, master data management, data governance and privacy, and application and API integration capabilities.
While Informatica has many current customers with on-premises deployments, it is taking on net-new business only through its cloud offerings. And the company is counting heavily on its service and systems integration partners to help customers migrate to the cloud: In June 2022 it expanded its Global Channel Partner Program, including offering advanced certifications for partners to help their customers make the jump to cloud.
Informatica, based in Redwood City, Calif., reported that in 2022 total annual recurring revenue grew 12 percent year over year to $1.52 billion.
CEO: Matthew Scullion
Matillion is one of the leading companies in the data transformation/ETL technology arena with its Data Productivity Cloud lineup.
The company’s technology portfolio includes Matillion ETL for building data pipelines, and integrating and transforming data in the cloud for data analytics and data science initiatives. It also includes Matillion Data Loader for batch loading and change data capture operations.
In June 2022, Matillion launched an upgraded Matillion Partner Network offering new incentives and resources to accelerate the development of partner-led solutions around Matillion’s technology. It also expanded the program’s reach to include systems integrators, consultants and technology partners.
Matillion is headquartered in Manchester, U.K., with its U.S. base in Denver. The company’s investors include Snowflake Ventures and Databricks Ventures.
CEO: Zhamak Dehghani
“Data mesh” has become a hot buzz phrase in the big data arena. Data mesh is a new approach to building a distributed data architecture that supports domain-specific data consumers (such as sales or marketing, for example), rather than traditional centralized, monolithic approaches like data warehouses or data lakes.
The data mesh concept was originally defined by Zhamak Dehghani in 2019 while working as a principal consultant at Thoughtworks, a global technology consulting firm.
Nextdata, of which Dehghani is the founder and CEO, has been generating some buzz of its own after the startup exited stealth in January. The company’s goal with its NextdataOS, according to the company’s website, is to do for data what containers and web APIs did for software.
The core of Nextdata’s technology is a data product container “that bundles data with everything needed to make it independently usable,” such as transformations, guarantees and policies, according to the company. Other components include analytical data product APIs, embedded computational policies and data product dynamic discovery.
CEO: Saket Saurabh
With its data engineering automation platform, Nexla looks to unify and simplify DataOps chores such as data integration, transformation and monitoring. Key to the platform’s functionality are its universal bi-directional connectors, Nexsets “logical data units” or data containerization building blocks, and the Nexla data fabric architecture.
Last month Nexla, based in San Mateo, Calif., launched the Data Product Marketplace, a new offering to help businesses develop a private data product marketplace to share and repurpose ready-to-use data products within their organizations.
CEO: Yael Ben Arie
Calling itself the automated data intelligence company, Octopai develops a platform with automated data lineage, data discovery and data catalog capabilities to help businesses ensure the trustworthiness of their data assets.
Octopai, headquartered in Rosh Haayin, Israel, just introduced an automated native connector for Google BigQuery, supporting a broader range of data discovery, lineage and migration capabilities around the Google Cloud data warehouse system.
Co-Founder and CEO: Nong Li
With businesses and organizations maintaining so much data today, enforcing data protection and privacy policies to ensure that only the right people see the right data has become a significant challenge.
Okera’s secure data access control technology is used to discover and classify sensitive data, maintain proper data access and management, and provide intelligence about sensitive data usage for audit, security and privacy compliance teams.
The company’s software also helps ensure compliance with data protection regulations including the European Union’s GDPR privacy laws and California’s CCPA/CPRA data privacy requirements.
Okera is based in San Francisco.
CEO: Josh Rogers
The Precisely Data Integrity Suite is a portfolio of software tools for high-speed data sorting, data transformation and ETL, data integration, data quality, data enrichment and location intelligence.
In January Precisely, headquartered in Burlington, Mass., acquired India-based Transerve, which provides a cloud-native location intelligence system and a data library with curated datasets. The acquisition brings to Precisely Transerve’s expertise in spatial data handling, processing and analysis.
CEO: Manish Sood
The Reltio Connected Data Platform is a cloud-native, software-as-a-service master data management system that is used to cleanse and unify complex, multi-source data into a single source of real-time information.
Reltio offers its MDM software tailored for vertical industries. In February the Redwood Shores, Calif.-based company debuted “velocity packs” with industry-specific data models, configurations and integrations for the life sciences and healthcare industries.
CEO: Itamar Ben Hemo
Rivery offers a cloud-based data operations platform that provides ELT, data pipeline and data integration capabilities. The technology aggregates, transforms and models data directly inside of a cloud data warehouse.
New York-based Rivery raised $30 million in a Series B funding round in May 2022.
President and CEO: Ali Kutay
Striim develops a unified data integration and streaming platform for building data pipelines for ingesting, processing and delivering data in real time for data analytics and business intelligence tasks.
The Striim Platform is available for on-premises and private cloud deployments. Striim Cloud, a fully managed software-as-a-service offering, launched in February 2022. The company also offers fully managed cloud editions that specifically work with Google Cloud BigQuery and Snowflake and provides integrations for a number of other platforms including Microsoft Azure Synapse, Amazon Web Services and Databricks.
Striim is headquartered in Palo Alto, Calif.
CEO: Nick Bonfiglio
The Syncari Data Automation Platform helps businesses manage, integrate, clean and distribute customer data throughout an enterprise. The system combines data management, workflow automation and multi-directional data synchronization.
In October San Francisco-based Syncari launched Syncari Embed, a platform of APIs that extend the Syncari functionality to application ecosystems. It provides more than 50 intelligent connectors, a custom connector SDK and a unified data model.
Interim CEO: Kristin Nimsger Weston
Talend’s early days were focused on developing next-generation data ETL (extract, transform and load) software and has since expanded its big data product portfolio to include technology for data integration, application and API integration, and data integrity and governance – all under the Talend Data Fabric unified platform.
Talend is in the process of being acquired by data analytics and integration software company Qlik under a deal announced in January. (Both companies are owned by private equity giant Thoma Bravo where interim CEO Kristin Nimsger Weston is an operating partner.) The acquisition is slated to close by midyear.
Originally founded in Paris, France, Talend today is headquartered in San Mateo, Calif.
CEO: Andy Palmer
Tamr develops a next-generation data mastering and enrichment platform that businesses use to transform messy source data into clean, consolidated and curated data.
In addition to its flagship mastering and enrich software, Tamr develops data products for specific industry use cases including B2B and B2C customer data mastering, healthcare provider data mastering and supplier data mastering.
Tamr, headquartered in Cambridge, Mass., was co-founded in 2013 by computer scientist and database luminary Michael Stonebraker.
In November the company was awarded its 16th patent for “method and system for large-scale data curation.”
CEO: Liran Zvibel
WekaIO is one of several companies on this year’s Big Data 100 list that spans the data storage and data management sectors of the IT industry. The company’s hybrid cloud Weka Data Platform, built on distributed file system and object storage technology, is positioned for data-intensive workloads in AI, machine learning, high-performance computing and more.
In February WekaIO launched a new channel partner program looking to recruit VARs, systems integrators and MSPs to build AI, ML and other high-performance solutions and services around the Weka Data Platform.
In November WekaIO, based in Campbell, Calif., raised $135 million in Series D funding.