10 Hot Big Data Companies To Watch In 2020

Data is not only growing in volume, it’s increasingly scattered across on-premises and cloud-based systems, complicating data management and governance tasks. Here are 10 companies with next-generation data management, data science and machine learning technology that solution providers should keep an eye on in 2020.

The Ones To Watch In 2020

Anyone working with big data today faces a number of tasks. Just finding and managing data that’s dispersed across hybrid-cloud and multi-cloud systems is a challenge. There’s also the need to prepare data and develop the models needed for analyzing data and deriving value from all that information.

The global big data and analytics software market reached $60.7 billion in 2018 and is expected to grow at a CAGR or 12.5 percent over the next five years, according to market researcher IDC.

Here’s a look at 10 companies in the big data space that solution providers should keep an eye on. Some are startups while others are more established. Some have developed ground-breaking technologies for managing and processing huge volumes of data within increasingly complex IT environments. Others are focused on automating time-consuming data science and machine learning tasks, bringing businesses and organizations closer to the goal of self-service analytics for everyday business users.


CEO: Steven Mih

Running business analysis or other big data workloads becomes more complex within hybrid-cloud and multi-cloud environments. Alluxio has developed a data orchestration platform for managing data for analytics and machine learning tasks when compute and data storage functions are in different locations.

Alluxio’s virtual distributed storage system is based on a memory-centric, fault-tolerant architecture that enables the separation of storage and compute functions, bringing data closer to distributed compute operations and simplifying data access for cloud workloads.

Based in San Mateo, Calif., Alluxio was found in 2015 at U.C. Berkeley’s AMPLab by the creators of the Tachyon open source project, upon which the Alluxio software is based.


CEO: Felix Van de Maele

Tracking an organization’s information assets to maximize data value while managing data governance is a major challenge in the big data era. One company to keep an eye on in 2020 is Collibra, a developer of data governance and catalog software built to address the data stewardship, governance and management needs of data-driven businesses.

Collibra, a perennial leader in the Gartner magic quadrant for metadata management solutions, raised $100 million in Series E funding In January 2019 giving the New York-based company a $1 billion post-money valuation.


CEO: Christopher Bergh

DataKitchen is a pioneer in the realm of DataOps, the concept of managing data analytics processes like an assembly line instead of the cumbersome, ad hoc processes found within many businesses.

The Cambridge, Mass., company’s platform manages the data pipeline through data engineering, data science and business analytics processes. DataOps combines concepts from Agile development, DevOps and statistical process control, among others, in a collaborative workflow.

Business processes are key to digital transformation initiatives and data flow is key to managing and changing business processes. Will the DataKitchen approach to improving data flows catch on in 2020?


CEO: Jeremy Achin

DataRobot is one of several developers of automated machine learning software on this list. The company markets a platform that enterprises use to rapidly build and deploy machine-learning models and create advanced AI applications.

In September, the Boston-based company debuted DataRobot MLOps, a machine learning operations system for deploying, monitoring and managing machine learning models in production. In recent months DataRobot has struck alliances with several business analytics software vendors, including MicroStrategy and Tableau Software, to link their products with DataRobot and apply AI technology to data analysis.

On Dec. 12, DataRobot struck a deal to acquire Paxata, a provider of self-service data preparation software. DataRobot said the Paxata technology can be used to automate data preparation processes such as creating datasets for training AI-driven predictive models.

DataRobot raised $206 million in Series E funding in September.


CEO: Ryohei Fujimaki

Startup DotData develops an end-to-end data science automation and operationalization platform that the company says “accelerates, democratizes and operationalizes” the entire data science process, shortening it from months to days. The company’s tools are targeted toward use cases in customer analytics and marketing, risk and governance, supply and demand management, asset management and business automation.

Based in San Mateo, Calif., DotData was spun off from NEC in 2018 and in October was selected as a qualified partner in Microsoft for Startups, Microsoft’s program to boost promising startups that develop innovative technology that runs on the Microsoft Azure cloud.


CEO: Sri Ambati

Another company in the red-hot machine learning/AI space is H2O.ai, developer of the open-source H2O data science and machine learning platform and H2O Driverless AI, the latter the vendor’s commercial automated machine learning platform that empowers data scientists to accomplish key machine learning tasks in minutes or hours instead of months.

In the new year look for the company to launch its H2O Q AI platform designed to help business users rapidly prototype AI-based applications.


CEO: David Flynn

Hammerspace is a player in the growing Data-as-a-Service space with technology that provides access to data across a hybrid, multi-cloud IT system. The company's software-defined Hybrid Cloud Data Control Plane, which relies on metadata-driven machine learning, virtualizes and abstracts data from multiple storage systems – both on-premise and cloud-based – making it available to any application, service, container or developer.

Founded in 2018, Hammerspace is headquartered in Los Altos, Calif.


CEO: Kumar Goswami

As businesses and organizations adopt hybrid cloud/on-premises systems, managing data across those hybrid environments becomes a challenge. Komprise has developed intelligent data management software for managing unstructured data across network-attached storage and cloud environments.

In September Komprise, based in Campbell, Calif., added a feature called Deep Analytics to its software that addresses one of the biggest problems in data analytics: searching across multiple data storage systems to identify the right data sets to analyze.


CEO: Matthew Scullion

One of the biggest challenges to running – and deriving value from – data analytics systems and data warehouses is finding quick and effective ways to prepare data and make it available for analysis.

ETL (extract, transform and load) tools have been around for years. But Matillion, which develops cloud-native ETL software, has been gaining attention among businesses and organizations looking for effective ways to upload data to cloud data warehouse systems such as AWS Redshift, Google BigQuery and Snowflake.

The Matillion ETL software provides advanced data location, curation, extraction, transformation and loading capabilities combined with data management, monitoring, security and other functionality.

In early December, Matillion expanded its product lineup with Matillion Data Loader, free software designed to provide simple data replication from popular data sources into cloud-based data warehouses.

Starburst Data

CEO: Justin Borgman

Presto, a distributed, high-performance SQL query engine for big data, has been getting a fair amount of attention lately. Originally developed at Facebook, the open-source software is used to run interactive queries against data sources of all sizes ranging up to petabytes without having to move the data.

Boston-based Starburst Data provides a commercial, enterprise-ready version of Presto combined with software tools, services, support and training. The company also offers Presto on AWS, Azure, the Google Cloud Platform and Kubernetes.

Whether Presto is widely adopted in 2020 and beyond remains a question mark. But Starburst Data is well positioned if it is.