The 10 Coolest Big Data Tools Of 2025 (So Far)

Here’s a look at 10 new, expanded and improved big data platforms and tools that solution and service providers should be aware of.

Technology background, News background, Modern futuristic template on red and blue background.Digital data visualization. Business technology concept.
Managing, moving, transforming and governing data for business applications and data analytics purposes has always been an important part of IT operations. But those chores have taken on a new level of priority as the surge of AI technology development pushes demands for data—lots of data—to new heights.

Complicating those chores is the fact that data is increasingly distributed across broad IT estates—both in the cloud and on premises. And the sheer volume of data businesses and organizations are working with continues to explode: More than 400 million terabytes of digital data are generated every day, according to market researcher Statista, including data created captured, copied and consumed worldwide.

Wrangling all that data, including collecting and managing it and preparing it for analytical and AI tasks, is a challenge. That’s driving demand for new big data tools and technologies – from both established IT vendors and startups – to help businesses access, collect, manage, move, transform, analyze, understand, measure, govern, maintain and secure all this data.

What follows is a look at 10 cool big data tools designed to help customers more effectively carry out all these big data chores. They include next-generation databases, data management tools and data analytics software. Some are entirely new products recently introduced by startups or established vendors, while others are products that have undergone significant upgrades or offer ground-breaking new capabilities.

Alteryx One

In May, Alteryx debuted Alteryx One, a new unified platform that combines the company’s AI-powered analytics and data preparation capabilities with a centralized management portal and unified licensing that the company said gives customers greater flexibility to automate and scale analytics across their data ecosystems.

Alteryx touted the new Alteryx One platform as the centerpiece of the company’s new vision of positioning its technology as an “AI Data Clearinghouse” that provides transformed, governed data for AI applications and agents.

“If you really take a step back and look at Alteryx, we’re a data workflow platform,” CEO Andy MacMillan said in an interview with CRN. “I think the way in which companies operate their infrastructure is going to change dramatically because of AI in the next … three to five years.”

Alteryx One is a centrally managed platform, with new tiered packaging and a unified licensing portal, that unifies multiple capabilities including analytics automation, low-code/no-code data preparation and blending, AI assistance, cloud flexibility and enterprise governance.

The platform includes the AI Control Center, a centralized portal for managing the entire Alteryx portfolio regardless of deployment model and ensuring consistent access and usage policies. AI Control Center provides unified orchestration functionality, combining license management with built-in security, governance and visibility into AI interactions, including with large language models.

Alteryx One also offers real-time data access through new Live Query for Databricks and Snowflake tools and introduced shared connectors for establishing reusable connections to cloud data sources. It also provided new and updated connectors for Microsoft Azure Synapse, Qlik and Starburst platforms.

Astronomer Astro Observe

Astronomer markets Astro, a data orchestration and observability platform based on Apache Airflow, an open-source workflow management system for building data pipelines and scheduling and monitoring data workflows.

In February, Astronomer announced the general availability of Astro Observe, a comprehensive “single pane of glass” tool for managing Apache Airflow pipelines. (Astro Observe was originally announced in September 2024.)

Data engineers build connections or “pipelines” to move data between systems, such as from operational applications to data warehouses for analytical tasks. More recently the need for data pipelines and workflow management tools has increased with the surge in development of AI models, applications and agents—all of which require huge volumes of data for training and production.

Astro Observe provides visibility into the health and performance of business-critical data products, according to Astronomer. It offers an actionable view of an entire data supply chain and enables proactive optimization and rapid problem resolution.

Key features of Astro Observe include an SLA (service level agreement) dashboard, timeline views of SLA compliance and task execution, a data health dashboard, a dependency graph for visibility into upstream and downstream data relationships, a best practices insights engine, predictive alerting, AI log summaries, and pre-built Snowflake cost management feature.

“Organizations need a more consolidated and streamlined approach to leverage their critical data assets in the age of advanced analytics and AI,” Astronomer CEO Andy Byron said in a statement. “With the addition of Astro Observe, Astro is now a unified DataOps platform that cuts through today’s chaos above the compute layer, replacing tool fragmentation with end-to-end visibility, control, and automation.”

Cube D3

In early June, Cube, a developer of semantic layer technology for modern data stacks, launched D3, an agentic analytics platform that automates and enhances data analytics tasks for both data stewards and data consumers.

Semantic layer software, including Cube’s foundational technology, provides a unified view of data and translates complex data structures into terms and concepts that allow business users to more easily access and analyze data.

Cube says D3 is the first agentic analytics platform developed on a universal semantic layer and the “agent-native and semantically grounded” software redefines the analytics experience by combining the productivity of agents with the precision of semantics.

D3 offers a suite of intelligent agents targeted toward data analysts and data engineers. AI Data Analyst provides self-service, natural language-driven analytics, generating semantic SQL queries visualizations and interactive data applications, according to Cube, while AI Data Engineer automates semantic model development from cloud data sources, continuously optimizing definitions and removing data pipeline bottlenecks.

Databricks Lakebase

At its Data + AI Summit earlier this month, Databricks launched Databricks Lakebase, a fully managed Postgres database for building data-intensive applications and AI agents.

Lakebase adds an operational database layer to the Databricks Data Intelligence Platform that the company said meets the need for fast, reliable data by today’s data applications, AI agents, recommendation engines and automated workflows.

In launching Lakebase, CEO and co-founder Ali Ghodsi says Databricks is taking aim at “traditional transaction” databases such as the Oracle Database and Microsoft SQL Server. In his keynote Ghodsi said Lakebase and other Databricks products unveiled at Data + AI Summit provide a way to “bring data and AI to as many people on the planet as possible. The reality on the ground is that it’s still really hard to succeed with data and AI.”

Lakebase is based on the open-source Postgres database technology Databricks acquired through its recent $1-billion acquisition of database startup Neon. It incorporates a data lakehouse architecture that Databricks says is more flexible than legacy databases and separates compute and storage for independent scaling.

Lakebase’s cloud-native architecture reduces latency and supports high concurrency and high availability needs, according to the company. It automatically syncs data to and from lakehouse tables and is integrated with Databricks Apps development capabilities and Databricks Unity Catalog.

At the Data + AI Summit Databricks also launched Agent Bricks, a unified space for building high-quality AI agents, and Databricks One, an edition of its platform that offers nontechnical business users with easier access to a number of Databricks data and AI capabilities.

dbt Labs Fusion

The popular dbt Labs data development and transformation platform got a major upgrade in May with the addition of the dbt Fusion engine that the company said dramatically boosts the system’s performance and scalability – and enhances the data developer experience – for building data pipelines and processing data at scale for AI and analytical applications.

With the new dbt Fusion engine, the company says its platform offers improved developer productivity, higher data velocity and cost savings through more efficient orchestration of data pipelines.

“AI is completely changing the way we interact with data. This is, seriously, the biggest launch in the history of the company and I am super, super excited to see the community response,” dbt Labs founder and CEO Tristan Handy said in an interview with CRN.

Fusion is written in the Rust programming language, which is seen as superior for building fast and reliable command line interface (CLI) tools – due, in part, to its ability to run multiple computations in parallel.

The Fusion engine powers the entire dbt platform, including the CLI, dbt Orchestrator, Catalog, Studio and other dbt Labs commercial products, speeding up parse times by a factor of 30x over the original dbt Core, according to the company. That enables faster analytics delivery, lower cloud costs and more trustworthy data pipelines for AI at scale.

Fusion is equipped with native SQL comprehension and other new capabilities that dbt Labs said collectively provide a “best-in-class developer experience.” Those capabilities stem from dbt Labs’ acquisition of SDF Labs, a startup developer of SQL code analyzer tools, in January.

Diliko

Startup Diliko emerged from stealth in November with its agentic AI-powered platform that provides automated data management and governance capabilities the startup says reduces operational complexity and costs.

The cloud-based Diliko platform optimizes data management performance and eliminates the need for deploying and managing costly infrastructure, according to the company. The service automates complex data management workflows using on-demand data integration, ETL (extract, transform, load) and data orchestration, and can synchronize data in real-time across internal and external systems.

The Diliko platform also ensures data governance and security with cloud-native capabilities including zero trust architecture, end-to-end encryption and multi-factor authentication.

Diliko is targeting its offering toward mid-size enterprises in the data-heavy healthcare, finance and logistics industries. The company says its service provides benefits for C-level executives, including CIOs, CFOs and chief data officers, and for those who work with data including data engineers, data scientists and data analysts.

Earlier this month, Diliko launched its inaugural partner program seeking to recruit big data service providers and consultants.

Qlik Open Lakehouse

In May, data integration, management and analytics software developer Qlik unveiled Qlik Open Lakehouse, a fully managed data lakehouse system built into the Qlik Talend Cloud that delivers real-time data ingestion at enterprise scale—millions of records per second—from hundreds of sources.

Qlik says Open Lakehouse provides 2.5x-to-5x faster query performance and up to 50 percent lower data storage infrastructure costs.

Qlik debuted the new product at its Qlik Connect conference in May, along with new advanced agentic AI capabilities for the Qlik Cloud platform and new embedded AI functionality in its Qlik Cloud Analytics service.

The new software and services are intended to help bridge what Qlik CEO Mike Capone called “the AI activation gap” by helping businesses and organizations more effectively collect and prepare data for AI and data analytics workloads and build workflows that act on AI results and analytical insights.

Qlik Open Lakehouse is based on Apache Iceberg – an open-source, high-performance data table format designed for large-scale datasets within data lakes. That provides the new data lakehouse with fully automated optimization capabilities including data compaction, clustering and pruning. The lakehouse also supports a number of Iceberg-compatible data processing engines including Snowflake, Amazon Athena, Amazon SageMaker, Apache Spark and Trino.

SAP Business Data Cloud

In February, SAP launched SAP Business Data Cloud, a new generation of the software giant’s data management platform for unifying data—both from SAP applications and third-party systems—for analytical and AI tasks.

In addition to incorporating and building on current SAP products, including Business Warehouse, Datasphere and Analytics Cloud, the new Business Data Cloud offers natively embedded data engineering, AI and machine learning technology through an OEM deal with Databricks.

Business Data Cloud includes new capabilities and pre-built content including packaged data products derived from SAP applications and what the company calls “insight applications” that use that data and AI models connected to real-time data for advanced analytics and planning across lines of business such as finance and human resource management.

The new data platform also boosts the operation of Joule, SAP’s AI copilot that launched in September 2023, for cross-functional workflows and business decision-making.

Through the strategic relationship with Databricks, every Business Data Cloud user can easily move SAP data into the Databricks Data Intelligence Platform and have immediate access to its capabilities such as AI-assisted data science and SQL data warehousing. SAP Business Data Cloud works by providing access to the serverless Databricks platform using integrated permissions and relies on the Delta Sharing open protocol developed by Databricks that allows organizations to securely share data across different cloud platforms without replicating or copying the data.

Snowflake Intelligence

At its Snowflake Summit 2025 conference earlier this month AI data cloud company Snowflake debuted Snowflake Intelligence, a conversational data analytics tool powered by intelligent data agents that allows users to ask natural language questions and uncover actionable insights from both structured tables and unstructured documents.

Snowflake Intelligence runs inside existing Snowflake environments, inheriting all security controls, data masking and governance policies, according to the company. It is designed to unify data from Snowflake, Box, Google Drive, Workday, Zendesk and other sources.

Snowflake also previewed Data Science Agent, an agentic companion that the company says boosts data scientists’ productivity by automating routine machine learning model development tasks.

Starburst AI Agents and Starburst AI Workflows

In May, data platform developer Starburst launched Starburst AI Agent and AI Workflows, additions to the company’s Starburst Enterprise Platform and Starburst Galaxy systems that the company said will help organizations accelerate enterprise AI initiatives and support their transition to a data lakehouse-based architecture.

AI Agent is a built-in conversational interface for governed natural language data product documentation and insight generation within the Starburst environment.

Starburst also debuted Starburst Data Catalog, an enterprise-grade metastore with native Iceberg support that’s purpose-built to replace the Hive Metastore that’s currently used within Starburst Enterprise.

“Our agent has the ability to discover data, enrich the data, put it into the data product, and then allow you to get insights from the data,” said Toni Adams, Starburst senior vice president of partner and alliances sales, in an interview with CRN. “Data management, and how that [data] is being made available to AI agents, is hugely important to enterprises these days … If I’m not compliant, or if I cannot secure and govern my data, I’m not going to deploy it [for] AI.”

AI Workflows is a suite of capabilities that speeds AI experimentation to production by unlocking governed, proprietary data, according to the company. The tools search unstructured data, orchestrate prompts and tasks with SQL, and govern model access. Capabilities within AI Workflow include AI Search, AI SQL Functions and AI Model Access Management.