The Coolest Data Management And Integration Software Companies Of The 2021 Big Data 100

Part 3 of CRN’s Big Data 100 includes a look at the vendors solution providers should know in the big data management and data integration software space.

Managerial Prerogative

To make productive use of the ever-growing volumes of data businesses and organizations find themselves wrestling with today, they need the right tools to identify and unify siloed data, manage and control data pipelines, prepare and transform data for analysis and machine learning tasks, and govern and secure data against misuse.

As part of the 2021 Big Data 100, CRN has compiled a list of data management and data integration software companies that solution providers should be aware of. They include established vendors such as Informatica, Confluent and Sumo Logic, as well as younger startups like Aparavi and Okera.

This week CRN is running the Big Data 100 list in slide shows, organized by technology category, with vendors of business analytics software, database systems, data management and integration software, data science and machine learning tools, and big data systems and platforms.

(Some vendors market big data products that span multiple technology categories. They appear in the slideshow for the technology segment in which they are most prominent.)


Top Executive: CEO Lewis Black

Actian develops data warehouse, data integration, hybrid data management and edge data management software. The Actian DataConnect hybrid integration system, incorporating the company’s UniversalConnect technology, can connect any data source in any location, regardless of the data format or protocol.

In February, Palo Alto, Calif.-based Actian launched its Customer 360 analytical offering running on the company’s Avalanche cloud data warehouse system.


Top Executive: CEO Satyen Sangani

Alation offers enterprise data catalog and data governance tools—what the company collectively calls “data intelligence”—that help businesses find, understand, trust and collaborate around the data they have within their complex IT environments. The Alation Data Catalog is used to develop an organized inventory of an organization’s data assets, providing support for data discovery and governance tasks.

On April 7 Alation, based in Redwood City, Calif., debuted the Alation Cloud Service, a cloud edition of the Alation Data Catalog, available through the AWS marketplace.


Top Executive: Founder, CEO Haoyuan Li

Alluxio’s Data Orchestration Platform links data-driven applications, including machine learning and business analytics software, with data sources such as Hadoop-based data lakes, Amazon S3 and Google Cloud Storage that are increasingly dispersed across on-premises, hybrid cloud and multi-cloud IT environments.

On April 16 Alluxio, based in San Mateo, Calif., released new Community and Enterprise 2.5 editions of its software with extended API support that boosts the system’s performance and expands its data connectivity range.


Top Executive: Founder, CEO Adrian Knapp

Startup Aparavi’s cloud-based Digital Intelligence and Automation Platform is used to find, classify, automate and govern distributed, unstructured data across on-premises and cloud systems for a range of tasks including data discovery and access, data retention and protection, and data governance, risk and compliance requirements.

Based in Santa Monica, Calif., Aparavi initially focused on data management for data backup tasks, but realized the potentially broader applications for its technology.


Top Executive: CEO Michael Klaus

The Ataccama One system offers a single, AI-powered platform that carries out numerous data management and data governance tasks including data discovery and profiling, metadata management, data cataloging, data quality management, master data management and data integration.

In November 2020, the Toronto-based company launched Ataccama One Gen2 with increased emphasis on automated, self-driving data management capabilities.


Top Executive: Founder, CEO Prat Moghe

Cazena, based in Waltham, Mass., provides a Data Lake as a Service for quickly building cloud-based data lake systems on AWS, Microsoft Azure or Cloudera.

The comprehensive list of service capabilities offered by the Cazena platform include cloud infrastructure and storage, data and analytics workload engines, data ingestion and integration, production operations, workload SLA management and optimization, security and encryption, governance and compliance, and support for analytics, data science and machine learning applications.


Top Executive: Founder, CEO Felix van de Maele

The Collibra Data Intelligence Cloud gives businesses visibility into their data ecosystem—from data warehouses and data lakes to master data repositories and operational databases—for a range of metadata management, data governance and information asset protection tasks.

Other software in Collibra’s portfolio includes the Collibra Platform for using data assets for business decisions and individual data governance, data catalog and data quality tools.

In February New York-based Collibra acquired OwlDQ, a developer of predictive data quality software.


Top Executive: Co-Founder, CEO Jay Kreps

Confluent’s flagship products, the Confluent Platform and Confluent Cloud (the latter launched in July 2020), organizes and manages massive volumes of streaming data and makes it available to operational applications, data analysis tools and business users.

The company’s products are based on the open-source Apache Kafka stream processing technology, which was originally developed by Confluent’s founders while working at LinkedIn.

Confluent, based in Mountain View, Calif., raised $250 million in Series E funding in April 2020, putting its pre-money value at $4.5 billion, and some industry observers anticipate a Confluent IPO in the near future.


Top Executive: CEO George Shahid

The Datameer Cloud Data Platform is used to manage complete data life cycles from data discovery, access, transformation and governance to cataloging of data assets for business analytics.

The company, based in San Francisco, also provides data management tools such as the Datameer Spectrum ETL (extract, transform and load) for building data pipelines and the Datameer Spotlight data catalog.


Top Executive: Founder, CEO Angel Vina

Denodo’s data virtualization technology uses data integration and data abstraction to provide a holistic view of enterprise information across a wide range of data sources including databases, data warehouses and cloud applications. That allows business users to consume data, regardless of data format, through reports, dashboards, applications and portals.

In March Palo Alto, Calif.-based Denodo launched Denodo Standard, a cloud-based, quick-start data integration offering that leverages the company’s data virtualization engine. The tool enables real-time data analytics and data services without the need to replicate data into other repositories.


Top Executive: CEO Billy Bosworth

Dremio develops a next-generation data lake query engine that establishes views into stored data, allowing data scientists and analysts to manage, curate and share data and enabling easy analytics for data consumers.

The company’s software is based on the Apache Arrow open-source technology for developing analytical applications that can process in-memory columnar data.

Dremio, based in Santa Clara, Calif., raised $135 million in Series D funding in January.


Top Executive: CEO George Fraser

Fivetran’s cloud-based data integration automation service and its broad (and growing) portfolio of data connectors are used to build data pipelines between operational applications and databases to on-premises and cloud data warehouses for analytical tasks.

In February Fivetran said it doubled its revenue and customer base in 2020. In June 2020 the company raised $100 million in Series C financing, boosting the Oakland, Calif.-based company’s total financing to $163 million and its market value to $1.2 billion.


Top Executive: CEO Anthony Brooks-Williams

HVR develops technology for high volume, real-time data replication chores. The company’s software replicates data from on-premises and cloud systems such as operational SAP applications or Oracle databases, to data warehouse and data lake systems such as Snowflake or Amazon Redshift.

The San Francisco-based company’s software is often used for big data projects that require data migration, data lake consolidation and real-time analytics.

Last week HVR launched HVR Agent-as-a-Service for Microsoft Azure, an Azure Managed Application offered through the Azure Marketplace.


Top Executive: Co-Founder, CEO Matthew Carroll

Enabling the legal and ethical use of data is becoming a major challenge. Immuta’s automated data governance platform provides an organization with granular, dynamic control over who can access data and for what purposes, ensuring security and privacy compliance.

The Immuta software works with database systems and data warehouses including Snowflake, Azure Synapse and Amazon Redshift. It also provides data governance for big data systems such as Starburst and Databricks.

Boston-based Immuta said it achieved record annual growth in 2020 and rising market share within the DataOps technology market.


Top Executive: CEO Amit Walia

Informatica was a pioneer in the data ETL (extract, transform and load) space and remains an industry leader with its comprehensive portfolio of enterprise cloud data management technologies including data integration, data governance, master data management and data quality management products.

The Informatica Intelligent Data Management Cloud is the company’s flagship platform.

In March Informatica, based in Redwood City, Calif., unveiled a new consumption-based pricing model for its software and new low-code/no-code application development capabilities.


Top Executive: CEO Buno Pati

Digital transformation initiatives often require lots of data. The Infoworks DataFoundry data operations and orchestration platform can be a key tool for large-scale, data-intensive projects with its ability to rapidly on-board, prepare and operationalize data within cloud, hybrid cloud and multi-cloud environments.

Infoworks is based in Palo Alto, Calif.

Magnitude Software

Top Executive: CEO Jeffrey Shoreman

Magnitude Software develops data integration and connectivity tools to integrate and deliver data across an enterprise. The integration and connectivity products incorporate technology Magnitude acquired when it bought Kalido in 2014 and Simba Technologies in 2016.

Magnitude also develops analytical tools for deep analysis of data generated by enterprise applications, specifically ERP applications from SAP and Oracle. Last week Magnitude, based in Austin, Texas, launched the first cloud-native editions of its Angles software for analyzing operational process data from SAP and Oracle ERP systems.


Top Executive: CEO Matthew Scullion

Matillion provides cloud-native data integration, replication and ETL (extract, transform and load) tools for moving data into cloud-based data warehouses and data lakes for business analytics. Matillion ETL and Matillion Data Loader are the company’s core products.

Matillion, based in Manchester, U.K., raised $100 million in Series D funding in February.


Top Executive: Co-Founder, CEO Amnon Drori

Octopai’s automated data lineage and discovery software helps data managers and data analysts quickly find and understand the data they need for business analytics and other tasks.

The Rosh Haayin, Israel-based company’s metadata management technology helps identify and locate data, wherever it resides throughout an organization. It’s also used to determine data lineage for maintaining data consistency and meeting regulatory and compliance requirements like the European GDPR or California’s CCPA.


Top Executive: CEO Nick Halsey

Okera, a rising star in the DataOps arena, markets a universal data authorization system that audits and authorizes access to data, allowing businesses and organizations to take control of their data security, privacy and regulatory compliance efforts.

The Okera Dynamic Access Platform includes software for building and enforcing data access policies, metadata management, and centralized auditing and reporting tasks.

In March Okera, based in San Francisco, expanded its platform with the ability to delegate data access policy management—an important function for enabling distributed data stewardship.


Top Executive: Co-Founder, CEO Yaniv Leven

Panoply’s cloud data system offers a fast, no-coding-required path to business analytics. The company’s platform performs data connection and integration, data storage management and data access tasks to provide analytics-ready data to where it’s needed.

In October 2020 Panoply, based in Tel Aviv, Israel, and San Francisco, raised $10 million, bringing its total financing to $24 million.


Top Executive: CEO Chris Hylen

The Reltio Connected Data Platform is a cloud-native master data management system that the company markets as a key component for digital transformation and data compliance initiatives. The company also develops several tools that work with the platform including Connected Customer 360 and Enterprise 360.

In January Reltio introduced Reltio Identity 360, a free cloud service for developing a “single source of truth” for customer profile data.


Top Executive: Co-Founder, CEO Girish Pancha

The StreamSets DataOps Platform is used to execute data engineering and data integration jobs that are key for continuous data delivery for data analysis and other tasks. The company’s technology manages what StreamSets calls “data drift”—constantly changing data that can disrupt data flows.

StreamSets is based in San Francisco.


Top Executive: President, CEO Ali Kutay

Striim’s platform provides real-time data ETL (extract, transform and load) integration that enables continuous data ingestion, in-flight data processing and delivery. The technology continually ingests a variety of high-volume, high-velocity data from enterprise databases and uses change data capture to process data generated by log files, messaging systems, cloud applications, IoT devices and more in real time.

Earlier this month Striim, based in Palo Alto, Calif., raised $50 million in Series C funding.

Sumo Logic

Top Executive: President, CEO Ramin Sayar

Sumo Logic’s machine data analytics platform provides real-time visibility into cloud applications and infrastructure systems running on AWS, Microsoft Azure and Google Cloud platforms. The system is most frequently used to identify and analyze operational IT and cybersecurity issues.

In March Sumo Logic acquired Security Orchestration, Automation and Response (SOAR) technology developer DFLabs to bolster the company’s threat detection, analysis, incident response and forensic investigation capabilities.

Sumo Logic, based in Redwood City, Calif., went public in September 2020.


Top Executive: CEO Christal Bemont

The Talend Data Fabric system provides data integration, data integrity and governance, and application and API integration capabilities in cloud, multi-cloud and hybrid cloud environments. The company, based in Redwood City, Calif., also offers the Stitch data pipeline system for moving data into data warehouse systems for business analytics tasks.

In March Talend, which had been traded on the Nasdaq exchange, struck a deal to be acquired by private equity firm Thoma Bravo and taken private in a deal valued at $2.4 billion. The companies expect to complete the acquisition later this year.


Top Executive: Co-Founder, CEO Andy Palmer

Tamr’s cloud-native master data management system connects internal and external data sources to provide complete, consolidated, analytics-ready data for a range of tasks. The software runs on the AWS, Microsoft Azure and Google Cloud platforms.

The company also offers data mastering software for specific applications including customer data mastering for B2B and B2C companies, clinical trial data management, data mastering for product rationalization and more.

Boston-based Tamr was co-founded in 2012 by database luminary and Tamr CTO Michael Stonebraker.


Top Executive: CEO Adam Wilson

Trifacta develops what the company calls a “data wrangling” platform, data preparation software used to explore, transform and enrich raw data into clean and structured data that can be used for business analytics, data visualization, machine learning and other tasks.

Earlier this month Trifacta, based in San Francisco, unveiled an expansion of its platform to create what it calls the “data engineering cloud,” adding full support for SQL and Python to create no-code capabilities for data engineers who apply software development and DevOps practices to accelerate data preparation and ETL (extract, transform and load).

The company also boosted the data integration capabilities of its platform with universal data connectivity, adding prebuilt data connectors that support more than 180 data sources.

Unravel Data

Top Executive: Co-Founder, CEO Kunal Agarwal

The Unravel DataOps system helps businesses manage data pipelines to provide data for on-premises and cloud-based data applications, including business analytics and machine learning systems.

The Unravel system provides tools for monitoring and managing data pipeline performance, correlating and analyzing resource and application data, and troubleshooting and optimizing data pipelines with AI-generated recommendations. The company develops editions of its software for AWS, Microsoft Azure, Google Cloud Platform, Cloudera and Hewlett Packard Enterprise Ezmeral.

In December Unravel, based in Palo Alto, Calif., joined the AWS Partner Network Global Startup Program.


Top Executive: CEO Susan Cook

The Zaloni Arena end-to-end DataOps platform for managing data pipelines provides metadata management, data catalog, data governance and self-service data access capabilities. The software is offered for a range of use cases including customer data management, cloud migration projects, and data compliance and risk management tasks.

In March Zaloni, based in Durham, N.C., unveiled Arena 6.2 with a new feature that accelerates data transformation by extending cataloging capabilities beyond just data to include data-related assets such as AI/ML models, code repositories, API endpoints and reports.