IBM, Nvidia, Databricks Back AI Startup Unstructured In $40M Round

New investors in Unstructured, which develops technology for ingesting and pre-processing unstructured data for use in developing large language models, or LLMs, include the funding arms of several AI-focused tech companies including Databricks Ventures, IBM Ventures, and Nvidia’s NVentures.

Unstructured, an upstart developing technology aimed at ingesting and pre-processing a wide range of unstructured data for use in developing large language models, or LLMs, for generative AI, Thursday said it has raised a B round of funding worth $40 million.

The new financing round brings total funding for San Francisco-based Unstructured to $65 million.

The B round was led by Menlo Ventures, and supported by the funding arms of several AI-focused tech companies including Databricks Ventures, IBM Ventures, and Nvidia’s NVentures. Other investors in the round include Sacramento Kings Chairman Vivek Ranadivé, Datastax CEO Chet Kapoor, Allison Pickens of the New Normal Fund, Madrona, Bain Capital Ventures (BCV), and Mango Capital.

[Related: Cognizant Launches New Advanced AI Lab With Focus On Core AI Research]

Madrona, BCV, and Mango previously invested in Unstructured.

Generative AI, or GenAI, has in just the last 12 months become one of the most important tech innovations.

Global IT consultant EY in February released its EY Reimagining Industry Futures Study which found that 43 percent of the 1,405 enterprises surveyed are investing in GenAI.

EY also found that GenAI ranks third among the nine emerging technologies tracked in the study, with “Automation and AI” ranking first. Among the companies already investing in GenAI, 80 percent are working on proof-of-concept for applications, while 20 percent have pilot projects underway, EY said.

Unstructured, founded in 2022, develops technology that makes unstructured data ready for use by LLMs for GenAI. Unstructured data, which includes data such as emails, documents, images, video, and so on that is difficult to manage with traditional tools, needs to be pre-processed into formats that it can be used by machine learning to build the LLMs on which GenAI depends.

Unstructured’s technology automates the transformation of unstructured data into formats needed for retrieval augmented generation (RAG and LLM fine tuning. The company claims it can drive performance improvements of over 20 percent for LLM models without the need for any customization. Its open source library has also been downloaded over 6 million times.

Unstructured in January released its commercial SaaS API, which it said already has over 1,000 paying customers, and in February unveiled its enterprise platform to continuously extract raw unstructured data to significantly cut the time developers and data scientists need to prepare data, the company said.

Unstructured CEO and Founder Brian Raymond, who previously served in the CIA and worked at The White House’s National Security Council before winding up in the investment banking and startup world, told CRN that his company is the first and only company that can ingest and pre-process all unstructured data into the right format for AI LLMs.

“Data ready for use with foundation models has to be in the JSON format,” Raymond said. “But if the data is a PDF, a PowerPoint, a JPEG file, and so, how do you get it into the JSON format? Until now, there was no answer. It’s always been done on a custom basis. It’s the reason we saw such explosive growth last year.”

By “explosive growth,” Raymond (pictured above) said his company will have seen 7 million downloads by the end of this month. The company now has over 1,000 paying customers, including over one-third of Fortune 500 enterprises.

Having AI-focused companies like Databricks, IBM, and Nvidia as investors is no coincidence, Raymond said.

“We have deep relationships with those companies,” he said. “We are a complementary technology for what they are building.”

Raymond said his company is not in a hurry to raise more funding.

“We’re in good shape now,” he said. “We raised $20 million in our A round six months ago. So we’re very well capitalized at this point.”

Unstructured currently has a self-service model when it comes to sales, with customers able to come of the street and grab an API without talking to anyone, Raymond said. The company also has account managers working with enterprise clients, and is working with hyperscalers and some database companies to sell its technology.

“For resellers, it’s still early,” he said. “And this is a new area for us. But we’re learning. If it makes sense, we’re open to working with the channel.”