The Coolest Data Management And Integration Tool Companies Of The 2020 Big Data 100

Part 4 of CRN’s Big Data 100 looks at the vendors solution providers need to know in the data management and integration software space.

The Right Tool For The Right (Big Data) Job

To make productive use of the ever-growing volumes of data businesses and organizations find themselves wrestling with today, they need the right tools to identify and unify siloed data, manage and control data pipelines, prepare and transform data for analysis and machine learning tasks, and govern and secure data against misuse.

As part of the 2020 Big Data 100, we’ve put together a list of data management and data integration tool companies – from established vendors to those in startup mode – that solution providers should be aware of. They offer software and tools for data ETL (extract, transformation and load), data catalogs, real-time data capture and management, meta data management, data virtualization, and data quality and data governance management, among others.

This week CRN is running the Big Data 100 list in slide shows with vendors of business analytics software, big data systems and platforms, database systems, data management and integration tools, and data science and machine learning tools. (Some vendors offer big data products that span multiple technology categories: They appear in the slideshow for the technology segment in which they are most prominent.)


Top Executive: CEO Rohit De Souza

Actian offers a portfolio of hybrid cloud data management, integration and analytic software and services, including the Actian Avalanche cloud-based data warehouse system.

Earlier this month the company, based in Palo Alto, Calif., debuted the Avalanche Real-Time Connected Data Warehouse Solution, which incorporates the company’s DataConnect hybrid data integration software into Actian Avalanche. That speeds integration with SaaS and on-premises applications and incorporates real-time streaming data from Kafka and Spark systems.


Top Executive: CEO Ash Ashutosh

Actifio helps businesses become data-driven enterprises with its Virtual Data Pipeline platform that manages copied data scattered across multiple cloud and on-premises systems. The software makes data available for a wide range of applications and tasks, such as analytics, data governance and compliance, application developement and testing, and backup and recovery – all while reducing unnecessary data duplication.

Actifio, based in Waltham, Mass., also offers Actifio Go, a SaaS implementation of its copy data management software.


Top Executive: Satyen Sangani

Alation, based in Redwood City, Calif., develops a searchable data catalog system that helps businesses and organizations create an inventory of their data assets, making it easier for analysts, IT managers and information workers to find, understand, trust, use and reuse data. The Alation software works by crawling databases and other data repositories and indexing the data assets.


Top Executive: CEO Haoyuan Li

Alluxio develops a virtual distributed file system that began as the Tachyon research project at the University of California, Berkeley’s AMPLab. The technology provides hybrid cloud data orchestration – bringing data closer to compute systems – to improve data accessibility and speed up analytical and machine learning applications.

In March the San Mateo, Calif-based company unveiled the Alluxio Structured Data Service with data catalog and data transformation capabilities that provides just-in-time data transformation for compute-intensive applications.


Top Executive: CEO Prat Moghe

Data lake systems are huge repositories of raw structured and unstructured data used for a broad range of applications. But they can be complex to build and manage.

Cazena, based in Waltham, Mass., offers a managed Data Lake-as-a-Service for building cloud-based data lakes on AWS or Microsoft Azure. The service includes cloud storage and infrastructure, data and analytics workload engines, workload SLA management and optimization, data ingestion and integration, security and encryption, governance and compliance, production operations, and support for analytics, data science and machine learning.


Top Executive: CEO Bob Moul

Circonus provides a machine data intelligence platform for collecting, storing and analyzing huge volumes of data generated by sensors, systems and devices. The system is targeted for applications around IT infrastructure, IoT deployments and time-series databases, conducting real-time fault detection, anomaly detection and predictive analytics tasks.

In January Philadelphia-based Circonus raised $6.8 million in Series A1 funding and named technology entrepreneur Bob Moul as CEO.


Top Executive: CEO Felix Van de Maele

Collibra has developed its namesake big data platform, with a data catalog system at its core, that provides businesses and organizations with full visibility of their data assets. The platform also includes data governance, data lineage, and privacy and risk management tools.

New York-based Collibra raised an impressive $112.5 million in funding earlier this month, bringing its total financing to $345.5 million and its valuation to $2.3 billion.


Top Executive: CEO Jay Kreps

Confluent’s flagship product, the Confluent Platform, provides the capability to organize and manage the massive volumes of streaming data being generated by businesses today and make the data available to business applications and information workers. The company’s event stream processing software is based on Apache Kafka, an open-source stream processing system that was developed by Confluent’s founders when they worked at LinkedIn.

Mountain View, Calif.-based Confluent raised a stunning $250 million in Series E funding earlier this month, pushing the company’s market valuation to $4.5 billion.


Top Executive: CEO Derek Hutson

Datical develops database release automation software, speeding up the process of deploying database code changes. While DevOps and Agile practices have sped up application development and delivery, those applications often require changes in underlying database systems, and practices around database changes and code deployment have not kept pace.

Datical DB, the company’s commercial product, is based on the open-source Liquibase software, to which Datical contributes technology.


Top Executive: CEO Angel Vina

Denodo’s data virtualization platform integrates data that’s in silos across disparate systems within an organization, regardless of its location or format, unifies the data for centralized security and governance tasks, and provides it to business users when needed.

Denodo, based in Palo Alto, Calif., offers a cloud service edition of its software running on AWS, Microsoft Azure and the Google Cloud Platform.


Top Executive: CEO Sanjay Vyas

Diyotta’s enterprise-scale, cloud-based data integration platform helps businesses and organizations integrate their data, whether data at rest, in motion, on-premises or in the cloud. The product links up with data warehouse and data lake systems to perform batch ETL and integration, bulk data migration and real-time data integration tasks.

Charlotte, N.C.-based Diyotta began offering its service through the Microsoft Azure Marketplace in February. The software was already available through the AWS and Google Cloud Platform marketplaces.


Top Executive: CEO Billy Bosworth

Dremio’s data lake engine provides a self-service semantic layer that analysts and data scientists use to explore data and create virtual datasets out of the huge volumes of data often stored in data lake systems. It also provides a way to directly query data in data lakes running on Hadoop, AWS S3 and Azure Data Lake Store.

Dremio, headquartered in Santa Clara, Calif., raised $70 million in Series C funding in March.


Top Executive: CEO Adam Famularo

Erwin’s flagship product, Erwin Data Modeler, is used to find, visualize, design, deploy and standardize an organization’s data assets. That makes it possible to discover and document data for large-scale data integration, data governance, master data management, business analytics and other big data initiatives.

Erwin, which has been an independent company since it was spun out of CA Technology in 2016, also provides data catalog and data literacy software. The company is based in Melville, N.Y.


Top Executive: CEO George Fraser

Fivetran offers fully managed Data Pipeline-as-a-Service with cloud-based data integration and connectors software that create pipelines to move data from cloud applications, databases and event logs into data warehouse systems such as AWS Redshift and Snowflake. The company is based in Oakland, Calif.


Top Executives: Co-CEOs Brian Platz and Flip Filipowski

Fluree’s platform organizes block-chain secured data in a highly scalable graph database. The company’s software is targeted toward “Web3” applications in supply chain management, MRO (maintenance, repair and operations), insurance, and credentials and identity.

Fluree just this month launched the Fluree Partner Network partner program for VARs, systems integrators and ISVs that partner with the Winston-Salem, N.C.-based startup.


Top Executive: CEO Matthew Carroll

Immuta is focused on the legal and ethical use of data. The company’s Automated Data Governance software provides businesses and organizations with a way to automate their data governance, audit and compliance efforts, providing self-service data access with automated privacy controls.

In late 2019 Boston-based Immuta added new sensitive data detection and additional privacy-enhancement features to the Automated Data Governance platform.


Top Executive: CEO Amit Walia

Informatica is one of the pioneers of data integration technology with its ETL tools that set the pace for the industry. The Redwood City, Calif.-based company has since expanded into new areas including enterprise cloud application and data integration, data engineering, data security, data quality and governance, data cataloging and master data management.

Last month Informatica updated its Intelligent Data Platform with new intelligence and automation functionality to boost the system’s cloud data warehousing and data lakes, master data management, and data governance and privacy capabilities.


Top Executive: CEO Buno Pati

Infoworks touts its DataFoundry enterprise data operations and orchestration system as a critical technology for enterprise digital transformation efforts. DataFoundry includes tools for data ingestion and preparation, data operations management and governance, data warehouse migration and data lake management, and data modeling and OLAP cube creation.

In February Infoworks, based in Palo Alto, Calif., made DataFoundry 3.0 generally available with native support for the Databricks Unified Analytics Platform, as well as new data onboarding, preparation and operations capabilities.

Magnitude Software

Top Executive: CEO Chris Ney

Magnitude Software’s product portfolio provides unified application data management capabilities including analytics and reporting, master data management, product information management and data connectivity. Many of the Austin-based company’s software products are focused on managing and analyzing data generated by SAP and Oracle applications.


Top Executive: CEO Matthew Scullion

Matillion is one of the new-generation of ETL software vendors, providing data extract, transform and load tasks needed for developing cloud-based data warehouse systems on AWS Redshift, Snowflake, Google BigQuery and Microsoft Azure Synapse platforms.

In September Matillion, headquartered in Manchester, U.K., launched its first channel program. In March the company announced the general availability of Matillion Data Loader, a code-free SaaS data loading system that data analysts use for quick and easy data integration tasks.


Top Executive: CEO Katie Horvath

Naveego offers the Complete Data Accuracy Platform for managing data quality. The software helps data managers and analysts discover what data an organization has, collate and synchronize data across multiple sources, maintain data accuracy and create a single record of data that can be enforced across an organization.

Naveego is based in Traverse City, Mich.


Top Executive: Amnon Drori

Octopai develops an automated, centralized, cross-platform metadata search engine that business intelligence groups use to discover, govern and track shared metadata. The software is used to maintain company-wide data consistency and help business analysts find and understand available data.

Octopai is based in Rosh Ha’ayin, Israel.


Top Executive: CEO Laniv Leven

Panoply’s data management platform makes it possible to synchronize and store data from more than 100 sources for data analysis tasks. The system combines cloud data warehouse infrastructure, ETL capabilities, automated data integration and AI-driven automation.

Panoply is based in San Francisco.


Top Executive: CEO Ash Munshi

Pepperdata’s software provides a way for IT managers to monitor and manage the big data analytics “stack,” the software and infrastructure that underly an organization’s business analytics systems, to help maintain system performance. The company’s product portfolio includes Platform Spotlight for a 360-degree view of infrastructure and resource utilization, Capacity Optimizer for improving cluster performance, and Query Spotlight for tuning query workloads.

Earlier this month the Santa Clara, Calif.-based company launched Streaming Spotlight for monitoring the performance of mission-critical Kafka streaming applications.


Top Executive: CEO Ashish Thusoo

Qubole’s Open Data Lake Platform is designed for machine learning, data streaming, real-time data analysis and ad-hoc analytics applications. Potential applications include customer 360-degree view analysis, customer sentiment analysis, customer micro-segmentation, clickstream analysis, multi-channel marketing and fraud detection.

In November Qubole, headquartered in Santa Clara, Calif., added new capabilities to the system to help users comply with data privacy and security regulations and implement financial governance and privacy controls.


Top Executive: CEO Manish Sood

Reltio focuses on customer data management with its recently launched Reltio Connected Customer 360. The software combines customer data from multiple sources to build customer profiles and drive hyper-personalization at scale while complying with customer consent and privacy laws. The platform is used in areas where customer experience is critical including retail, consumer packaged goods, insurance, healthcare and other industries.

Reltio is based in Redwood Shores, Calif.


Top Executive: CEO Girish Pancha

StreamSets develops its DataOps Platform for data ingestion, integration and ETL tasks. The San Francisco-based company brings DevOps practices and technology to data integration to avoid what it calls “data drift” – the constant and unexpected changes within data that disrupt dataflows.

The StreamSets DataOps Platform includes Control Hub, Data Collector and Transformer. StreamSets on Cloud builds data pipelines into any cloud system from any cloud system.


Top Executive: CEO Ali Kutay

Striim’s platform enables continuous, real-time data ingestion, integration, processing and delivery. The system utilizes low-impact change data capture to ingest high-volume, high-velocity data from databases, log files, messaging systems, Hadoop, cloud applications and IoT devices. The technology performs in-flight data processing, including transformations and aggregations, before delivering it to diverse on-premises and cloud environments.

Striim is based in Palo Alto, Calif.

Sumo Logic

Top Executive: CEO Ramin Sayar

Sumo Logic’s Continuous Intelligence Platform conducts machine data analysis for a range of operational, business intelligence and IT security applications. The technology automates the collection, ingestion and analysis of application, infrastructure, security and IoT data to provide analytical insights in seconds.

Sumo Logic, headquartered in Redwood City, Calif., has an active channel program and in November launched the App Intelligence Partner Program designed to help ISV partners integrate their applications, particularly in the areas of real-time operations and security intelligence, with the Sumo Logic Continuous Intelligence Platform.


Top Executive: CEO Josh Rogers

Syncsort offers a broad portfolio of big data software and services for high-speed data sorting and integration and database optimization.

In December Syncsort, based in Pearl River, N.Y., acquired Pitney Bowes’ software and data business. The acquisition adds location intelligence, data enrichment and customer information management capabilities to Syncsort’s data integration and optimization product lineup.


Top Executive: CEO Christal Belmont

Talend is a major developer of data integration, data integrity and data governance tools with its Talend Data Fabric suite of cloud software. The suite includes Stitch Data Loader, Talend Pipeline Designer and Talend Data Preparation; tools for data governance including Talend Data Quality, Talend Data Catalog, Talend Data Steward and Talend Data Inventory; and application and B2B integration tools.

Talend, headquartered in Redwood City, Calif., began offering the Talend Cloud edition of its software through the AWS marketplace in December and achieved both AWS Retail Competency status and Amazon Redshift Ready designation.


Top Executive: CEO Andy Palmer

Tamr’s product lineup includes software for data mastering (helping large enterprises enhance data operations), data migration, data lake cleanup and analytics acceleration. The Cambridge, Mass.-based company targets its software toward applications in such areas as spending and procurement analytics and supply chain risk and planning.


Top Executive: CEO Adam Wilson

Trifacta offers “data wrangling” software for data preparation tasks. The technology sits between data storage systems and downstream processing systems and visualization and analysis tools, cleaning, structuring and enriching raw data into a needed format.

Trifacta, based in San Francisco, raised $100 million in Series E financing last September to support its rapid growth.

Unravel Data

Top Executive: CEO Kunal Agarwal

The Unravel DataOps Platform helps businesses manage their big data applications, including analytics, machine learning and IoT, by providing visibility into how data pipelines are performing and correlating application, resource and user data and troubleshooting and optimizing data pipeline performance.

In addition to its flagship platform Palo Alto, Calif.-based Unravel has tools for the leading public cloud systems: AWS, Microsoft Azure and Google Cloud Platform. In March the company debut a toolset for the Cloudera system.


Top Executive: CEO Susan Cook

The Zaloni Arena data management system includes a data catalog and tools for metadata management, data quality control, data provisioning and data governance.

Former IBM and Oracle executive Susan Payne Cook was named CEO of the Research Triangle Park, N.C.-based Zaloni in November 2019, while co-founder and longtime CEO Ben Sharma transitioned to chief product officer.