The Coolest Data Management And Integration Software Companies Of The 2022 Big Data 100
Part 3 of CRN’s Big Data 100 includes a look at the vendors solution providers should know in the big data management and data integration software space.
There is an increasingly popular saying that every company today is a data company. That’s because just about every business of any size is striving to collect, manage and leverage data for competitive advantage—either operationally or through the use of increasingly sophisticated data analysis technology.
The problem is finding the right tools and processes to collect, process, integrate and manage all that data. The total amount of data created and replicated worldwide reached 64.2 zettabytes in 2020, according to market researcher IDC, and is continuing to grow at a CAGR of 23 percent. That means the total “global datasphere” will reach 181 zettabytes in 2025.
To make productive use of the ever-growing volumes of data businesses and organizations find themselves wrestling with today, they need leading-edge technologies to identify and unify siloed data, manage and control data pipelines, prepare and transform data for analysis and machine learning tasks, improve and maintain data quality, and govern and secure data against misuse.
As part of the 2022 Big Data 100, CRN has compiled a list of data management and data integration software companies that solution providers should be aware of. They include established vendors such as Informatica, Confluent and Talend, as well as hot startups like Airbyte, Bigeye and Syncari.
This week CRN is running the Big Data 100 list in slide shows, organized by technology category, with vendors of business analytics software, database systems, data management and integration software, data warehouse systems, big data systems and cloud platforms, and data science and machine learning tools. Another slide show will highlight the startup companies within each category.
Some vendors market big data products that span multiple technology categories. They appear in the slide show for the technology segment in which they are most prominent.
Top Executive: Co-Founder, CEO Michel Tricot
Airbyte develops an open-source data integration and ETL platform for replicating and synchronizing data from APIs, files and databases to data warehouses, data lakes and other destinations.
Founded in 2020, startup Airbyte is challenging established data management tech vendors and says its goal is to “make data integrations a commodity” with its open-source extensibility and transparent and predictable compute-based pricing.
Based in San Francisco, Airbyte raised $150 million in Series B funding in December 2021, bringing its total funding to $181.2 million.
Top Executive: CEO Satyen Sangani
Alation develops enterprise data catalog and data governance tools—what the company collectively calls “data intelligence”—that help businesses find, understand, trust and collaborate around the data they have within their complex IT environments. The Alation Data Catalog is used to develop an organized inventory of an organization’s data assets, providing support for data discovery and governance tasks.
In April 2021 Alation, based in Redwood City, Calif., debuted the Alation Cloud Service, a cloud edition of the Alation Data Catalog that’s available through the AWS marketplace. In October the company acquired Lyngo Analytics, a startup developer of AI-based data intelligence software that will help Alation market its software for a wider audience of business users.
Top Executive: Founder, CEO Haoyuan Li
Alluxio’s Data Orchestration Platform manages large-scale, distributed data workloads by linking data-driven applications, such as machine learning and business analytics software, with data sources such as Hadoop-based data lakes, Amazon S3 and Google Cloud Storage that are increasingly dispersed across hybrid and multi-cloud IT environments.
The company’s software is a virtual distributed file system that makes all data appear local no matter where it’s stored.
Alluxio, based in San Mateo, Calif., raised $50 million in a Series C round of funding in November.
Top Executive: CEO Michal Klaus
The Ataccama One system offers an AI-powered platform that carries out numerous data management tasks including data integration, data governance, data cataloging, data quality management and master data management. The system, which works across hybrid and cloud environments, is targeted for use by data stewards, data analysts, data engineers and data scientists.
Ataccama is based in Toronto.
Top Executive: Co-Founder, CEO Kyle Kirwan
Delayed, missing, duplicated and damaged data can hinder big data projects and digital transformation initiatives. Bigeye offers a data observability platform that helps data management teams identify and fix data quality problems.
The platform automates data quality management tasks by instrumenting data sets and data pipelines, applying metrics to monitor and measure data quality, detecting data anomalies and alerting data managers when issues occur.
Bigeye, founded in 2019 and based in San Francisco, raised $17 million in Series A funding in April 2021 and then another $45 million in Series B funding in September, financial resources the company is using to accelerate its product development and expand its go-to-market efforts.
Top Executive: Founder. CEO Felix van de Maele
The Collibra Data Intelligence Cloud gives businesses visibility into their data ecosystem—from data warehouses and data lakes to master data repositories and operational databases—for a range of tasks including data cataloging, data governance, data privacy, data quality and observability, and data lineage.
In November Collibra, with dual headquarters in New York and Brussels, Belgium, raised $250 million in a Series G funding round that boosted the company’s valuation to $5.25 billion. That was followed in January by an investment of undisclosed amount from Snowflake Ventures, the investment arm of data cloud company Snowflake.
Top Executive: Co-Founder, CEO Jay Kreps
Confluent calls itself the “data in motion” company with its Confluent Platform and Confluent Cloud systems that organize and manage massive volumes of streaming data and make it available to operational applications, data analysis tools and business users.
The company’s products are based on the open-source Apache Kafka stream processing technology, which was originally developed by Confluent’s founders while working at LinkedIn.
Confluent, based in Mountain View, Calif., went public on the Nasdaq exchange in June 2021. The company reported 2021 revenue of $387.9 million, up 64 percent from $236.6 million one year before.
Top Executive: Co-Founder, CEO Clint Sharp
Cribl’s observability data engineering software, including its flagship Cribl Stream system, is used to build pipelines for routing high volumes of telemetry data, including machine log, instrumentation, application and metric data, between operational, storage, analytical and security systems.
In October Cribl launched Cribl Stream Cloud Enterprise Edition, a cloud service for securely managing globally distributed observability data pipelines. The service makes it possible for businesses and organizations to centrally configure, manage, monitor and orchestrate data observability pipeline infrastructure anywhere in the world, according to the company.
Cribl, founded in 2017 and based in San Francisco, raised $200 million in a Series C round of funding in August 2021, resources the company is using to expand its go-to-market efforts—including channel initiatives.
Top Executive: Founder, CEO Mike Waas
Datometry brings virtualization to the database stack with its Hyper-Q virtualization platform, which the company says makes applications and databases interoperable without the need to change SQL or APIs.
Datometry’s software allows enterprises to run existing applications directly on cloud databases without the need for complex database migrations. That frees Datometry customers from vendor lock-in on their on-premises database technology and accelerates moves to the cloud.
San Francisco-based Datometry recently began offering the Hyper-Q platform through the Microsoft Azure Marketplace.
Top Executive: Founder, CEO Tristan Handy
dbt Labs markets a development framework and tools that data engineers and data analysts use to transform, test and document data in cloud data warehouse systems.
dbt Labs raised $222 million in Series D funding in February—investors included Snowflake and Databricks—bringing the Philadelphia-based company’s valuation to $4.2 billion. The company said the financing was needed to support its rapid growth, including a six-fold increase in revenue over the previous year.
Top Executive: Founder, CEO Angel Vina
Denodo’s data virtualization platform uses data integration and data abstraction to create a holistic view of enterprise information across a wide range of data sources including databases, data warehouses and cloud applications.
The platform provides a way for an organization to build a “logical data fabric” that allows business users to consume data, regardless of data format, through reports, dashboards, applications and portals.
In March, Palo Alto, Calif.-based Denodo launched Denodo Standard, a cloud-based, quick-start data integration solution that leverages the company’s data virtualization engine. The company also offers the free Denodo Express and Denodo for Cloud for hybrid/multi-cloud environments.
Top Executive: CEO Guy Eilon
Equalum, based in Sunnyvale, Calif., develops a fully managed data ingestion platform that provides streaming change data capture and modern data transformation capabilities.
The system is built on the open-source Apache Kafka real-time data framework and can be deployed in on-premises, hybrid or cloud environments.
The recently launched Equalum Continuous Data Integration Platform 3.0 natively supports all data integration, ingestion and transformation use-cases including batch ETL, streaming ETL, replication and multi-modal change data capture—all in a unified platform.
Top Executive: CEO George Fraser
Fivetran’s cloud-based data integration automation service and its broad (and growing) portfolio of data connectors are used to build data pipelines between operational applications and databases to on-premises and cloud data warehouses for analytical tasks.
In September 2021 Fivetran struck a deal to buy HVR, a developer of data replication technology, in a move to strengthen its position in the data integration and management arena. At the same time Fivetran announced a $565 million Series D funding round, much of which was used to cover the $700 million price tag for the HVR acquisition.
In February Fivetran, based on Oakland, Calif., said that in 2021 it more than doubled its revenue and grew its customer base by 75 percent.
Top Executive: Co-Founder, CEO Matthew Carroll
As data volumes expand, governing the legal and ethical use of data is becoming a major challenge. Immuta’s universal cloud data access control platform provides automated granular, dynamic control over who can access data and for what purposes, ensuring data security and privacy compliance.
Immuta’s software works with database systems and data warehouses including Snowflake, Azure Synapse and Amazon Redshift. It also provides data governance for big data systems such as Starburst and Databricks.
In May 2021 Boston-based Immuta raised $90 million in Series D funding. The company recently said that in 2021 it grew annual recurring revenue by more than 100 percent and doubled its customer base.
Top Executive: CEO Amit Walia
Informatica was a pioneer in the data ETL (extract, transform and load) space and remains an industry leader with its comprehensive portfolio of enterprise cloud data management technologies including data integration, data governance, master data management and data quality management products.
The Informatica Intelligent Data Management Cloud is the Redwood City, Calif.-based company’s flagship platform. In March the company launched Intelligent Data Management Cloud for Retail.
Informatica, which was acquired and taken private in 2015, went public in October 2021 on the New York Stock Exchange. CEO Amit Walia has set a goal of reaching $1 billion in annualized recuring revenue this year.
Top Executive: CEO Matthew Scullion
Matillion provides cloud-native data integration, replication and ETL tools for moving data into cloud-based data warehouses and data lakes for business analytics.
Matillion ETL and Matillion Data Loader are the company’s core products.
Top Executive: Co-Founder, CEO Barr Moses
Monte Carlo’s data observability software is used to monitor data across IT systems, including in databases, data warehouses and data lakes, to gauge and maintain data quality, reliability and lineage—what the company calls “data health.”
The startup’s platform evaluates data according to its freshness and how up to date it is, the volume or completeness of data tables, the data schema or organization of the data, data lineage including sources and usage, and the data’s distribution (whether the data’s values are within an accepted range).
Monte Carlo, founded in 2017 and based in San Francisco, raised $60 million in Series C funding in August 2021, financing the company is using to accelerate product development, fuel its go-to-market efforts and promote the data observability concept.
Top Executive: Co-Founder, CEO Saket Saurabh
Nexla has developed a unified data operations platform for creating scalable, repeatable and predictable data flows throughout an organization. The software is used to integrate, automate and monitor incoming and outgoing data for data use cases including data science and business analytics.
Nexla’s product portfolio includes Nexsets, which automates manual, time-consuming data engineering tasks, making it easier to access, integrate and transform data that may be scattered across disparate systems and creating what the company calls a “converged data fabric.” Nexsets works by creating logical views of data without the need to copy or duplicate data.
Nexla is based in San Mateo, Calif.
Top Executive: CEO Yael Ben Arie
Octopai’s automated data lineage and discovery software helps data managers and data analysts quickly find and understand the data they need for business analytics and other tasks.
The company’s metadata management technology helps identify and locate data, wherever it resides throughout an organization. It’s also used to determine data lineage for maintaining data consistency and meeting regulatory and compliance requirements like the European GDPR or California’s CCPA.
In September 2021 Octopai, based in Tel Aviv, Israel, launched a data catalog product that is integrated with the company’s data lineage platform.
Top Executive: Co-Founder, CEO Nong Li
DataOps rising star Okera offers a universal data authorization system that audits and authorizes access to data, allowing businesses and organizations to take control of their data security, privacy and regulatory compliance efforts.
The Okera Dynamic Access Platform includes software for building and enforcing data access policies, metadata management, and centralized auditing and reporting tasks.
Founded in 2016, Okera is based in San Francisco.
Top Executive: CEO Josh Rogers
The Precisely Data Integrity Suite includes software tools for data governance, data quality, data integrity, data integration and data enrichment, as well as for using data to engage with customers.
Precisely also offers a series of location intelligence products, including Precisely MapInfo and Precisely Spectrum Spatial. In January the company struck a deal to buy PlaceIQ, whose software provides location-based consumer insight for marketing and business decisions.
Precisely, based in Burlington, Mass., was rebranded from Syncsort in May 2020. On April 21 of 2022 Precisely disclosed that private equity firms Insight Partners and Partners Group had signed a definitive agreement to make “a strategic investment” in Precisely, joining Clearlake Capital Group, TA Associates and Centerbridge Partners as institutional investors. Terms of the deal were not disclosed.
Top Executive: Raj Bains, Co-Founder, CEO
Prophecy.io provides a low-code data engineering platform for developing and deploying data pipelines used to manage streams of data for business analytics and machine learning tasks. The system combines visual drag-and-drop development with Agile software engineering practices.
In February Prophecy.io debuted a SaaS-based version of the platform built on Apache Spark, the open-source analytics engine, and the Kubernetes container management system, and running on the Databricks system on AWS, Microsoft Azure and Google Cloud Platform.
Prophecy.io, based in Palo Alto, Calif., raised $25 million in Series A funding in January.
Top Executive: CEO Chris Hylen
The Reltio Connected Data Platform is a cloud-native master data management system that the company markets as a key component for digital transformation and data compliance initiatives.
The company also develops a number of tools that work with the platform including Connected Customer 360, Enterprise 360 and Identity 360, the latter a free cloud service for developing a “single source of truth” for customer profile data.
In October 2021 the company launched Reltio Integration Hub, new low-code/no-code software for quickly integrating data sources and data consumers with the company’s data management platform. Integration Hub debuted as part of the Reltio Connected Data Platform 2021.3 release.
In November Reltio, based in Redwood Shores, Calif., raised $120 million in new funding from investors.
Top Executive: CEO Itamar Ben Hemo
Rivery offers a fully managed Software-as-a-Service data operations platform that includes data ingestion, transformation, orchestration, reverse ETL and other capabilities. The Rivery technology helps organizations quickly build automated data pipelines using hundreds of prebuilt connectors and pipeline templates.
Based in New York and Tel Aviv, Israel, Rivery raised $16 million in Series A funding in March 2021.
Top Executive: President, CEO Ali Kutay
Striim’s unified real-time data integration and streaming platform enables continuous data ingestion, in-flight data processing and data delivery for analytics and business intelligence tasks.
The Striim technology continuously ingests a variety of high-volume, high-velocity data from enterprise databases and uses change data capture to process data generated by log files, messaging systems, cloud applications, IoT devices and more in real time.
In February the company introduced Striim Cloud, a fully managed Software-as-a-Service platform for streaming data integration and analytics.
Striim, based in Palo Alto, Calif., raised $50 million in Series C funding in March 2021.
Top Executive: Co-Founder, CEO Nick Bonfiglio
Syncari’s no-code data automation platform helps data professionals unify, clean, manage and distribute trusted customer data across an enterprise. The system utilizes a range of data synchronization, unification, governance and access capabilities to perform its tasks.
In June 2021 the company unveiled the addition of sophisticated workflow automation capabilities to help sales and marketing teams make more effective use of customer data.
Syncari, based in San Francisco, was founded in 2019 by former executives from Marketo, Mulesoft and Zendesk. In May 2021 the company announced a $17.3 million Series A round of funding.
Top Executive: CEO Christal Belmont
The Talend Data Fabric system provides data integration, data integrity and governance, and application and API integration capabilities in cloud, multi-cloud and hybrid-cloud environments. The company, based in Redwood City, Calif., also offers the Stitch data pipeline system for moving data into data warehouse systems for business analytics tasks.
On April 7 Talend acquired Gamma Soft, a developer of change data capture technology.
In 2021 Talend, which had been a publicly traded company, was acquired by private equity firm Thoma Bravo and taken private in a deal valued at $2.4 billion.
Top Executive: Co-Founder, CEO Andy Palmer
Tamr’s cloud-native master data management system connects internal and external data sources to provide complete, consolidated, analytics-ready data for a range of tasks. The software runs on the AWS, Azure and Google Cloud platforms.
Tamr Cloud, a cloud service purpose-built for customer data, is based on the Tamr technology.
The company also offers data mastering software for specific applications including customer data mastering for B2B and B2C companies, clinical trial data management, data mastering for product rationalization and more.
Boston-based Tamr was co-founded by database luminary and Tamr CTO Michael Stonebraker.
Top Executive: Co-Founder, CEO Kunal Agarwal
The Unravel Platform is an AI-powered DataOps and data observability system that helps businesses and organizations manage data pipelines for on-premises and cloud-based data-driven applications, including business analytics and machine learning systems.
The Unravel system provides tools for monitoring and managing data pipeline performance, correlating and analyzing resource and application data, and troubleshooting and optimizing data pipelines with AI-generated recommendations.
Unravel Data is based in Palo Alto, Calif.