Databricks Steps Up Data Governance With Okera Acquisition

The addition of Okera’s data governance technology to the Databricks Lakehouse Platform will boost data protection and governance for AI and machine learning tasks.


Data lakehouse platform developer Databricks is acquiring Okera, a developer of secure data governance and access control software, under an agreement the two companies disclosed Wednesday.

Databricks plans to incorporate the Okera technology into the Databricks Lakehouse Platform to extend that system’s data governance capabilities – an increasingly important requirement for AI, machine learning and large language models (LLMs) that utilize huge volumes of data.

“Every industry is now dealing with how [to] leverage data, and AI in particular, in a really secure and governed way,” said Jonathan Keller, Databricks senior director of product management, in an interview with CRN.

Sponsored post

[Related: The Big Data 100 2023]

Terms of the acquisition were not disclosed and a Databricks spokesperson said the acquisition is expected to be completed within the next month.

Okera, founded in 2016 and based in San Francisco, develops an AI-centric data governance system used to discover and classify sensitive data, maintain proper data access and management, and provide intelligence about sensitive data usage for audit, security and privacy compliance teams.

The company’s software helps ensure compliance with data protection regulations including the European Union’s GDPR privacy laws and California’s CCPA/CPRA data privacy requirements.

A core strength of the Okera technology is its ability to extend data governance, including data discovery and data protection, to AI and machine learning data assets.

Data-Hungry AI And ML Projects

The rise of AI, particularly machine learning models and LLMs – like the GPT-3 language that underlies the ChatGPT chatbot – adds to the data governance challenge because of the explosive growth in data volumes and the fact that many AI and machine learning systems rely on machine-generated data, according to a Databricks blog post, written by multiple executives including CEO Ali Ghodsi, announcing the Okera acquisition.

One example: Ensuring that customer data used to train and operate a machine learning system is protected and complies with data privacy regulations and policies.

“There’s really nobody else out there today that is holistically solving this governance problem across data, ML models, notebooks, dashboards, etcetera,” Keller said. “So the two [Databricks and Okera] together really add the unique thing that unlocks the use of AI and ML in a lot of different verticals, regulated industries, where it’s very, very hard – if not impossible – to do that.”

Databricks, also based in San Francisco and one of the IT industry’s fastest growing companies, develops its Databricks Lakehouse Platform for a range of data unification, data analytics, data lakehouse, data engineering, and AI and machine learning operations.

The Databricks platform includes the Unity Catalog, unified governance technology for data and AI assets including files, tables and machine learning models.

With the acquisition Databricks plans to build the Okera technology into the Unity Catalog and expand the data governance capabilities of the Databricks platform, Keller said.

Okera’s technology includes an intuitive, AI-powered interface that automatically discovers, classifies and tags sensitive data such as personally identifiable information, according to the blog post. Such capabilities are needed to assess data, develop policies for data access and use, and audit and analyze sensitive data.

Okera has also been developing data isolation technology, now in private preview, “that can support arbitrary workloads while enforcing governance control without sacrificing performance,” the blog post said. That includes AI workloads.

Fast Product Integration Timetable

Okera’s product is already integrated with the Databricks platform and the two are in use together at joint customer sites, according to Keller. The plan is to incorporate the Okera technology within the Unity Catalog rather than continue selling it as a separate product.

“Most data governance solutions are kind of bolted on top of the data platform [rather than] being natively integrated,” Keller said. “There’s a lot of amazing things you can do when you own the platform that’s executing the code and generating the data, and you are governance-aware and AI-aware at that level. There’s just a lot of power that you can bring to really ensure that you’re maintaining governance and compliance while really unlocking the value of that data.”

For Databricks’ solution provider and systems integrator partners, the Okera acquisition will allow them to sell a more complete data governance solution and develop intent-based data governance policies with the Databricks platform without the need to carry out complex integration work, Keller said.

Okera co-founder and CEO Nong Li, who was an engineer at Databricks in its early days, will be joining Databricks along with about a dozen people in Okera technical roles. Keller said bringing the additional data governance technical and domain expertise into Databricks is another goal of the acquisition.

In other news, Databricks announced Thursday that Databricks Ventures, the company’s venture capital arm, had invested an undisclosed amount in Immuta, a data security technology developer, as part of Immuta’s Series E funding round. Databricks has had a six-year partnership with Immuta and the two have more than 50 joint customers.