The AI Data Problem: Bridging The ‘AI Activation Gap’

‘There’s a lot of hype around AI. But let’s face it, most companies haven’t really realized much of a return on their AI investments to date,’ said Qlik CEO Mike Capone.

2155481683
One thing’s for sure about AI: It needs data—lots and lots of data.

As AI initiatives move from development and proof-of-concept stages to model training and operational production, many AI projects have stalled because organizations lack the data infrastructure needed to provide those AI systems with the proprietary data they require to carry out their tasks.

“There's this what we call ‘the AI activation gap.’ There's a lot of hype around AI. But let's face it, most companies haven't really realized much of a return on their AI investments to date. There's actually a reset going on right now,” Qlik CEO Mike Capone said in an interview with CRN in May.

Organizations are often missing key components of the “big data stack.” Critical technologies include ETL (extract, transform and load) tools for collecting data from operational and storage systems and preparing and formatting data for AI tasks. Tools for building and managing pipelines that keep data flowing to AI applications are key, as are data governance and data quality management tools. Other, more specialized products like vector databases can also be important.

“We’re seeing a push from enterprises’ senior management to [adopt] AI,” said Daniel Avancini, chief data officer at Indicium, a New-York-based AI and data service and consulting firm, in an interview with CRN. “But most companies understand that they can’t really do the more advanced AI use cases or applications they want because their data platforms are not ready. They don’t have the framework, the [data] governance, the security or the technology that will provide that AI-ready data for these applications. We believe that most companies are not 100 percent ready for large-scale AI applications.”

“We believe there’s a big opportunity for service providers like us to help develop the platforms that companies need to be ready for AI applications,” Avancini said.

In May Fivetran, a developer of data movement software, released the results of a survey of 401 data leaders and professionals in which 42 percent of respondents said that more than half of their AI projects had been delayed, underperformed or failed due to data readiness issues. The reason why AI projects so often fail to deliver, according to the report, is that needed data is not fully centralized, governed or made available in real time for AI models.

The need to supply data for AI has become a product driver for leading IT vendors. Dell Technologies’ Dell AI Factory, for example, incorporates data-crunching GPUs from Nvidia for AI processing tasks—which create a new challenge: “What do I do about the data? How do we feed these GPUs good, high-quality data?” said Dell product director Vrashank Jain, in an interview.

The broader Dell AI Data Platform includes the company’s PowerScale and ObjectScale storage systems, along with the Dell Data Lakehouse—which incorporates a data analytics engine from Starburst and a data processing engine powered by Apache Spark—for data management, transformation and governance for AI applications and agents. “The more data capabilities you present to the agent tech workflow, the better outcomes there are going to be,” Jain said.

In May Boston-based Starburst unveiled new AI Agent and AI Workflows to its data platform to help organizations accelerate AI initiatives by supporting an AI-ready data architecture built on a data lakehouse. And In June fast-growing data and AI platform giant Databricks, which has made providing data for AI applications a key focus of its Data Intelligence Platform, launched its new Lakebase database for managing data and serving it up for AI applications and agents.

The exploding need for data for AI applications and agents is also a factor behind a number of strategic acquisitions this year within the big data space.

Case in point is Salesforce’s pending $8 billion deal to buy Informatica. Salesforce is going big on agentic AI and the company knows that requires a foundation of trusted data. Informatica is a leading developer of data integration, data catalog, master data management, and data quality and governance software.

Even though the acquisition isn’t expected to close until early next year, Salesforce and Informatica have already announced plans to link Informatica’s Intelligent Data Management Cloud platform with Salesforce Agentforce. The plan includes utilizing Informatica’s master data management software to integrate customer data from enterprise systems to ensure data quality and enrich AI agents for sales and service use cases.

“There is no AI without data,” Informatica CEO Amit Walia told CRN on the eve of the company’s Informatica World conference in May.

In February IBM announced a deal to acquire DataStax and that company’s database platform, data streaming technology and development tools for building data-intensive AI applications that use retrieval augmented generation. IBM said the acquisition would expand the data management capabilities of its IBM Watsonx AI portfolio, address the data needs of enterprise generative AI, and “bring the power of unstructured data” to enterprise AI applications. (The acquisition closed on May 28.)

Qlik, a King of Prussia, Pa.-based provider of data integration, data quality and analytics software, has taken steps to provide a comprehensive portfolio of data tools for AI, including the January acquisition of real-time data streaming and ingestion software developer Upsolver and the June launch of a cloud data lakehouse—all targeted to help businesses and organizations more effectively collect and prepare data for AI and data analytics workloads and build workflows that act on AI results and analytical insights.

IT vendors are also stepping up their efforts to work with the channel to make it easier for solution providers to utilize their big data technology for AI tasks. Last month Confluent said it would invest $200 million over the next three years to help system integrators, ISVs, MSPs and others in the company’s global partner ecosystem to incorporate the Confluent data streaming platform—a key technology for real-time AI—into their services and solutions. That’s in addition to initiatives Confluent has launched over the last 15 months to help systems integrators, ISVs, and OEM partners adopt the Confluent platform.

There are plenty of data-related challenges to address. In the aforementioned survey of data leaders and professionals, Fivetran found that: 68 percent of organizations with less than half of their data centralized reported lost revenue tied to failed or delayed AI projects; 41 percent said a lack of real-time data access was preventing AI models from delivering timely insights; 74 percent were managing or planned to manage more than 500 data sources, creating significant integration complexity; 29 percent said data silos were blocking AI success; and 65 percent planned to invest in data integration tools as their primary strategy to enable AI.

Likewise the 2025 State of Analytics Engineering Report, based on a survey of 459 data practioners and leaders and recently issued by dbt Labs, found that 41 percent planned to increase their investment in AI tooling over the next 12 months and 38 percent planning to increase their spending on data quality and observability technology. (Philadelphia-based dbt Labs is a leading provider of data transformation and testing technology.)

Solution providers and strategic service providers see the need for data for AI initiatives as a significant opportunity. In June Indicium launched its AI Data Squads service designed to help businesses and organizations manage complex data modernization and migration projects. AI Data Squads leverages the IndiMesh software and services framework that Indicium debuted in April to help build, scale and sustain the company’s AI and data solutions.

“It’s hard to find a company that is 100 percent mature with their data stack,” Avancini said. “All these agents, they need good data, they need good data platforms. And we believe that there's a big opportunity for service companies like us to help develop the platforms that companies require to have AI ready in production for the business applications.”

Insight Enterprises, the giant Chandler, Ariz.-based solutions and services provider, has developed a successful AI data practice that is seeing big demands from customers— especially around generative AI projects.

“It’s a growing market from generative AI—all of that depends on data and so data has been an amazing market for us,” said Amol Ajgaonkar, CTO of product innovation at Insight Enterprises, in an interview with CRN.

Ajgaonkar says Insight has been helping its clients with a range of AI efforts “across the board” and across industries including building predictive models, traditional machine learning models and large language models (LLMs). “Most of the conversations that we are having right now tend to be around generative AI, because right now that is the buzzword, that is where the curiosity lies for everyone and trying to see where they can leverage generative AI to help their business.”

Ajgaonkar says it’s critical to look first at the desired outcome of the AI project use case and then work backward to determine what data work needs to be done. Insight’s data experts examine a client’s situation and determine what data is available, how it is collected and managed, where and how it is stored, the data’s quality and refresh rate, and other parameters. They will even evaluate whether a client has the data center capacity to run the large language models their projects require.

Insight data teams will then recommend a course of action based on the anticipated outcomes of the AI initiative. “It’s a matter of process and consulting to get to that outcome faster,” Ajgaonkar said.

IT and professional services provider EY takes the position that 80 percent to 90 percent of spending for AI projects should be devoted to underlying data infrastructure, said Hugh Burgin, who heads up EY’s Data & AI consulting services business, in an interview. “It’s a big opportunity for us,” he said.

Burgin, in an interview with CRN at the Databricks Data + AI Summit in June, said EY’s data scientists and AI engineers have assembled a checklist of the characteristics of AI-ready data. Data must be accessible at scale, be based on open standards for interoperability, and be fresh and up to date. It must be reliable and trusted, based on good data governance practices and policies, and be securely managed. And data must be “global,” Burgin said, in terms of covering a comprehensive landscape for the needed task.

Ajgaonkar at Insight said a challenge for solution providers right now is keeping up with the rapid pace of change in the AI world. “The AI landscape is evolving so rapidly. Every two weeks, something new is happening,” he said pointing to MCP (Model Context Protocol) servers and recent developments in agent-to-agent communications. Ensuring that Insight doesn’t lock clients into AI data frameworks that might quickly become outdated is always a consideration.

Another key part of Insight’s job is managing clients’ expectations about AI and educating them about what LLMs can and cannot do. “The first step is, we have to gauge whether we need to educate the customer, set the expectations right on what [an AI system’s] capabilities are and what they should expect out of it.”

For solution providers, having deep domain expertise for AI initiatives within vertical industries will be a critical component of helping clients develop supporting data systems for AI, said David Zember, senior vice president of worldwide channels and alliances at Qlik,” in an interview with CRN.

“There’s a ton of opportunity to help customers get their hands around their data and build a great data strategy,” he said.