The Coolest Data Warehouse And Data Lake Companies Of The 2023 Big Data 100
Part 5 of CRN’s 2023 Big Data 100 takes a look at the vendors solution providers should know in the data warehouse and data lake systems and services space.
You Can’t Beat The House
Data warehouses have been at the center of data analytics systems within many businesses and organizations for years, going as far back as the 1980s. With the growing adoption of cloud, data warehouse services offered by Amazon Web Services, Snowflake and Google Cloud have become popular alternatives to on-premises data warehouses.
Even more recently Databricks and other proponents of the data lakehouse concept have championed data lakehouses as a more flexible and cost-effective alternative to data warehouses.
The debate goes on. As part of the CRN 2023 Big Data 100, CRN has compiled a list of data warehouse and data lake system and service vendors that solution providers should be familiar with. They include established vendors such as Cloudera, Databricks and Dremio as well as more recent startups like Firebolt and Onehouse.
This week CRN is running the Big Data 100 list in a series of slide shows, organized by technology category, spotlighting vendors of business analytics software, database systems, data warehouse and data lake systems, data management and integration software, data observability tools, and big data systems and cloud platforms.
Some vendors market big data products that span multiple technology categories. They appear in the slideshow for the technology segment in which they are most prominent.
CEO: Marc Potter
Actian develops a portfolio of data management and analysis products led by its Avalanche Cloud Data Platform that provides data integration, management and analytics in one system.
Other products include the DataConnect low-code integration platform, the Vector Analytics business intelligence and data analysis tool, the Zen Edge embeddable edge data management software, and the Ingres relational transactional database.
Actian, based in Sunnyvale, Calif., is owned by HCL Technologies.
President, CEO: Robert Bearden
The Cloudera Data Platform is the company’s flagship system, a hybrid, multi-cloud platform for data warehouse, data hub, data flow, operational database, stream processing, data engineering, machine learning and data analysis operations.
In August Cloudera launched Cloudera Data Platform One, an all-in-one data lakehouse Software as a Service that provides self-service analytics and exploratory data science capabilities, along with built-in enterprise security and machine learning.
In November Cloudera launched a revamped partner program in a move to recognize and reward the broader range of channel partners the Santa Clara, Calif.-based company is working with.
CEO: Ali Ghodsi
Databricks is one of the fastest-growing companies in the IT industry with its Databricks Lakehouse Platform for data unification, data analytics, data warehouse, data engineering, AI and machine learning operations.
Over the last year Databricks, based in San Francisco, has developed Lakehouse packages with pre-assembled software and applications developed by partners through the company’s Brickbuilder program targeted toward vertical industries such as financial services, health care and life sciences, and—just this month—manufacturing.
CEO: Mike Waas
Datometry develops Hyper-Q database virtualization technology that makes existing applications interoperable with modern cloud data warehouse systems such as Microsoft Azure Synapse, Google BigQuery and Amazon Redshift. The San Francisco-based company recently unveiled support for Datometry running on the Azure platform.
Other Datometry offerings include qInsight for analyzing data warehouse workloads and qShift for converting legacy data warehouse schemas when developing foundations for cloud migrations.
Interim CEO: Edward Sharp
Dremio promotes its namesake data lakehouse platform, including Dremio Cloud, as providing self-service analytics with “data warehouse functionality and data lake flexibility.” The company’s data lake query engine is built using the Apache Arrow open-source development platform for in-memory analytics.
In March the company updated its data lakehouse platform with new functionality built around the Apache Iceberg format for analytic data tables. That followed November updates in the platform for writing and updating data, enhanced support for semi-structured data, and expanded business intelligence and data ecosystem integrations.
Dremio, based in Santa Clara, Calif., raised $160 million in Series E funding in January 2022.
CEO: Eldad Farkash
Cloud data warehouse startup Firebolt focuses its services on developers and data engineers who need extreme data warehouse speed and elasticity as they build data-intensive applications.
The company is boldly challenging cloud data warehouse giants like Amazon Web Services, Google Cloud and Snowflake that provide cloud data warehouse systems for a broad range of tasks.
Firebolt, based in Tel Aviv, Israel, exited stealth in late 2020 and raised $100 million in Series C funding in January 2022.
CEO: Chris Gladwin
Ocient develops the Ocient Hyperscale Data Warehouse, a system that can transform, store and analyze datasets that are terabytes, petabytes or even exabytes in size.
The Ocient system is available for either cloud or on-premises deployment or hosted in the company’s OcientCloud. Ocient began offering the system through the AWS Marketplace in October 2022.
In February Chicago-based Ocient said it had recorded 171 percent growth during the previous year.
CEO: Vinoth Chandar
Touting itself as “the new bedrock for your data,” startup Onehouse is developing a foundation for an open-source, cloud-native, fully managed data lakehouse service.
The company’s service is based on Apache Hudi, an open-source transactional data lake project that brings database and data warehouse capabilities to a data lake. The goal is to serve as a data integration layer between different data repositories, according to the company.
In February Onehouse, based in Menlo Park, Calif., said that it had raised $25 million in Series A funding.
President, CEO: Steve McMillan
Teradata was founded in 1979 as a collaboration between the California Institute of Technology and the Citibank advanced technology group, developing a database management system for parallel processing systems. The company would go on to be a pioneer in what would become the data warehouse industry.
Today the company offers the Teradata Vantage and Teradata VantageCloud multi-cloud data platforms for high-performance enterprise analytics. The company also offers the Teradata VantageCore IntelliFlex massively parallel processing hardware platform and Teradata IntelliSphere software portfolio of data management tools.
CEO: Neil Carson
Yellowbrick offers its Yellowbrick Data Warehouse, an elastic, cloud-native data warehouse that runs either on-premises or in the cloud—including in a customer’s virtual private cloud.
Yellowbrick aggressively competes against Oracle, IBM Netezza, Snowflake and Teradata, offering what it says is a more cost-effective alternative.
In March Yellowbrick, based in Mountain View, Calif., said that the U.S. Naval Supply Systems Command had chosen the Yellowbrick Data Warehouse as part of its data modernization strategy—including replacing legacy IBM Netezza data warehouse systems.