Databricks Expands GenAI Data Capabilities With Latest Acquisition

The data lakehouse giant has bought Boston-based startup Lilac, a developer of tools used to improve the quality of data used by GenAI applications and LLMs.

Fast-growing data and AI platform developer Databricks has acquired Lilac, a Boston-based startup that provides data science tools for improving the quality of data for generative AI applications and the large language models (LLMs) that power them.

The integration of Lilac's tooling into Databricks “will help customers accelerate the development of production-quality generative AI applications using their own enterprise data,” said a company blog post written by Matei Zaharia, Databricks co-founder and CTO, Naveen Rao, Databricks vice president of Generative AI, and other executives.

The acquisition, announced Tuesday, is the latest by data lakehouse powerhouse Databricks to extend its capabilities in the AI space. Databricks bought generative AI startup MosaicML for $1.3 billion in June of last year, acquiring technology that developers use to build and train models using their own data.

[Related: IBM, Nvidia, Databricks Back AI Startup Unstructured In $40M Round]

Other Databricks acquisitions over the last year include natural language processing pioneer Einblick in February, data replication startup Arcion in October, and data governance tech provider Okera in May.

News of the acquisition came one day after Databricks and chip designer Nvidia unveiled an expanded alliance to deepen technical integration between their technologies and optimize data and AI workloads on the Databricks Data Intelligence Platform.

Nvidia, which is holding its GTC 2024 conference this week, was an investor in Databricks’ $500-million Series I funding round in September.

Lilac was founded last year by former Google engineers Daniel Smilkov and Nikhil Thorat. Terms of the acquisition were not disclosed, although the blog indicated that Smilkov, Thorat and the Lilac team have joined Databricks.

Databricks described Lilac’s software as a “scalable, user-friendly tool for data scientists to search, cluster, and analyze any kind of dataset with a focus on generative AI.”

The Lilac technology can be used for a range of uses cases, Databricks said, from evaluating the output of LLMs to understanding and preparing unstructured datasets for model training. Databricks said its own MosaicML team is among Lilac’s users.