The Coolest Data Warehouse And Data Lake Companies Of The 2024 Big Data 100

Part 5 of CRN’s Big Data 100 takes a look at the vendors solution providers should know in the data warehouse and data lake systems and services space.

Inserting image...

In-House Analytics

Data warehouses have been at the center of data analytics systems as far back as the 1980s. Today cloud-based data warehouse services offered by the likes of AWS, Snowflake and Google Cloud have become popular alternatives to on-premises data warehouses.

Even more recently Databricks, Dremio, Starburst and other companies have championed data lakehouses as a more flexible, more cost-effective alternative to data warehouses.

As Part of the CRN 2024 Big Data 100, CRN has compiled a list of data warehouse and data lake system and service vendors that solution providers should be familiar with. They include established vendors such as Databricks and Teradata, as well as more recent startups like Onehouse.

This week CRN is running the CRN 2024 Big Data 100 in a series of slide shows, organized by technology category, spotlighting vendors of business analytics software, database systems, data warehouse and data lake systems, data management and integration software, data observability tools, and big data systems and cloud platforms.

Here's a look at 10 companies with products in the data warehouse and data lakehouse space.

Some vendors have big data product portfolios that span multiple technology categories. They appear in the Big Data 100 slideshow for the technology segment in which they are most prominent.


Top Executive: CEO Marc Potter

The Actian Data Platform, previously known as Avalanche, is a unified system for connecting, managing and analyzing data, offering data integration, data warehouse and data analytics capabilities across hybrid computing environments. The platform was updated in November with new capabilities including hybrid integration-as-a-service.

Actian, a division of HCLSoftware based in Sunnyvale, Calif., also markets a number of relational (Ingres and Actian X), NoSQL (Zen, Actian NoSQL and HCL Informix) and analytical (Vector) database products.

The Actian product portfolio also includes the DataConnect data integration tool, DataFlow data streaming software, and OpenRoad and VoltMX application development and modernization tools.


Top Executive: CEO Charles Sansbury

The Cloudera Data Platform, Cloudera’s flagship system for public and private clouds, is a hybrid, multi-cloud platform with tools and capabilities for a broad range of big data tasks including operational database, data warehouse, data distribution, data analytics, data engineering, stream processing and machine learning operations.

In March Cloudera announced an expanded collaboration with Nvidia to create Cloudera Powered by Nvidia through the integration of Nvidia NIM Microservices, part of the Nvidia AI Enterprise platform, into Cloudera Machine Learning, the latter the Cloudera Data Platform service for AI/ML workflows.

Cloudera, founded in 2008 and headquartered in Santa Clara, Calif., was one of the pioneers in the Hadoop big data platform space, along with then-rival Hortonworks, which merged with Cloudera in January 2019. Previously a public company, Cloudera was acquired in October 2021 by private equity firms Clayton, Dubilier & Rice and KKR and taken private,

There have been changes in Cloudera’s executive ranks over the last year. CEO Rob Bearden, who joined Cloudera with the Hortonworks deal, stepped down in June. In August the company hired Charles Sansbury, previously CEO of ASG Technologies, as Cloudera’s new CEO. That was followed in November with the hires of new chief financial, chief marketing and chief product officers.


Top Executive: CEO Ali Ghodsi

Databricks, with its flagship Data Intelligence Platform, has been one of the fastest growing big data companies in the IT industry. While the company is best known for marketing its platform for assembling data lakes and data lakehouse systems, it has also become a major player in providing the data foundation for building and deploying machine learning and generative AI applications.

Furthering that AI momentum, Databricks in March launched DBRX, a general-purpose large language model that the company says “outperforms all established open-source models on standard benchmarks.” DBRX enables organizations to “cost-effectively build, train, and serve their own custom LLMs,” the company said.

Databricks made a number of acquisitions in the last year including real-time data replication technology developer Arcion and data quality tool provider Lilac. But it’s biggest purchase was its $1.3 billion acquisition of generative AI platform startup MosaicML in July.

Databricks certainly has the resources for such acquisitions. In September 2023 Databricks raised a staggering $500 million in a Series I round of funding that boosted the company’s value to $43 billion.


Top Executive: CEO Sendur Sellakumar

Dremio markets its Dremio Unified Lakehouse Platform for self-service analytics and AI applications. The platform incorporates the Apache Software Foundation’s Arrow framework for developing data analytics applications using columnar data, and the Apache Iceberg format for data analytics tables.

In September Dremio debuted its Reflections next-generation SQL query acceleration technology, which the company said paved the way for sub-second data analytics performance across an organization’s entire data ecosystem, regardless of where data resided.

Dremio, based in Santa Clara, Calif., hired Splunk Chief Cloud Officer Sendur Sellakumar in July 2023 to be the company’s new CEO.


Top Executive: CEO Chris Gladwin

Ocient offers the Ocient Hyperscale Data Warehouse, a next-generation data warehouse system designed to perform real-time analysis of complex, hyperscale datasets that create compute-intensive workloads. The data warehouse is offered for on-premises and public cloud deployments and through the OcientCloud.

Geospatial data analysis, using huge volumes of multidimensional data, is a key use case for the Ocient system.

In March Chicago-based Ocient raised $49.4 million in an extension of the company’s Series B financing.


Top Executive: CEO Vinoth Chandar

Onehouse develops the Universal Data Lakehouse, a fully managed cloud data lakehouse service that can ingest data from all of a customer’s data sources in minutes and supports all query engines.

The service is built on the Apachi Hudi open-source data management framework that brings database and data warehouse capabilities to data lakes.

Onehouse, headquartered in Menlo Park, calif., was founded in 2021 and emerged from stealth in 2022. The company raised $25 million in Series A funding in February 2023.


Top Executive: CEO Justin Borgman

Starburst has been getting a lot of attention with its full-featured data lakehouse analytics platform, based on the Trino SQL query engine, that provides the capabilities needed to discover, organize and analyze data without the need for time-consuming and costly data migrations.

The company’s system is offered in two editions: The self-managed Starburst Enterprise and the fully-managed cloud Starburst Galaxy.

Earlier this month Starburst launched its Galaxy “Icehouse” managed data lake system that incorporates the Apache Iceberg data table format for analytic datasets. Starburst said the Trino and Iceberg combination provides users with high performance and scalability without the high costs of a custom system.

In March Starburst expanded its executive ranks, hiring former Delphix, PagerDuty and Demandware executive Steven Chung as the company’s president, as well as hiring a new chief product officer and a new senior vice president of engineering. CEOand co-founder Justin Borgman told CRN that the management expansion is part of the groundwork for a possible IPO.


Top Executive: CEO Ami Gal

SQream’s big data platform makes it possible to query extremely large, complicated data sets using a GPU-based engine.

The SQream product portfolio includes the SQreamDB SQL database that can perform complex analytics on petabyte-scale data volumes, the SQream Blue SQL data lakehouse (currently in beta) for data migration and transformation, and the Panoply all-in-one data platform. (SQream acquired Panoply in 2021.)

SQream is headquartered in New York.


Top Executive: President and CEO Steve McMillan

Teradata could be considered the original big data company, founded in 1979 to develop a database specifically for large-scale data analytics. The company was originally a joint venture between California Institute of Technology researchers and Citibank’s advanced technology group, according to a company history.

Today the company offers VantageCloud, a complete multi-cloud analytics and data platform, and ClearScape Analytics, the in-database analytics component of the platform. The company also develops Teradata AI Unlimited, an on-demand AI/ML engine in the cloud.

In July 2023 Teradata acquired Stemma, which developed a cloud-native data catalog system that uses AI and ML to help users discover and use data and metadata more effectively.

San Diego-based Teradata reported revenue of $1.83 billion for all of 2023, up 2 percent from just under $1.80 billion in 2022.

Yellowbrick Data

Top Executive: CEO Neil Carson

The Yellowbrick Data Warehouse system is available for both on-premises and public cloud deployments, as well as a hybrid option. The system is designed to run complex queries at up to petabyte-scale with guaranteed sub-second response times and at lower costs.

At the heart of the system is the company’s massively parallel SQL relational database system that’s built on Kubernetes for elasticity and portability. Patented Direct Data Accelerator technology removes bottlenecks as data moves from storage to the CPU and across the network. The data warehouse runs in the cloud on Amazon Web Services and Microsoft Azure, or on premises on the company’s own Andromeda hardware (based on AMD EPYC CPUs).

Yellowbrick, headquartered in Mountain View, Calif., has recently expanded beyond analytics into supporting data-driven applications.