For a decade, Databricks has focused on democratizing data and AI for organizations around the world. And since the debut of ChatGPT last November, and the recent introduction of Dolly 2.0, every customer has been asking us how they can leverage the power of AI and large language models (LLMs) in their businesses. Immediately following those questions, they ask about how they can protect the security and privacy of their data in this new world.
That’s why we’re excited to announce that we have entered into a definitive agreement to acquire Okera, the world’s first AI-centric data governance platform. Okera solves data privacy and governance challenges across the spectrum of data and AI. It simplifies data visibility and transparency, helping organizations understand their data, which is essential in the age of LLMs and to address concerns about their biases.
How does AI change data governance?
Historically, data governance technologies, regardless of sophistication, rely on enforcing control at some narrow waist layer and require workloads to fit into the “walled garden” at this layer. For example, cloud data warehouses rely on SQL for access control, and it’s efficient as long as all the workloads fit into “SQL”. This had been the case for a couple decades, when the primary applications of data had indeed been SQL-centric, e.g. business intelligence reports that generate SQL queries.
The rise of AI, in particular machine learning models and LLMs, is making this approach insufficient. First, the number of data assets an enterprise has to govern increases exponentially, because many data sources used in AI are machine-generated instead of human-generated. Second, given the rapid pace of development of the AI landscape, no single company is capable of creating a walled garden expressive enough to capture the state-of-the-art. A vendor can enforce access control for its own SQL-based data warehouse engine, but wouldn’t be able to change every single open source library to make sure they adhere to the particular control of a walled garden. This means that AI specific governance concerns such as provenance and bias fall outside the reach of traditional data governance platforms.
Okera’s AI-centric governance technologies
Okera’s data governance platform offers two unique technologies that can address the challenges of data governance in this new world.
First, Okera offers an intuitive, AI-powered interface to automatically discover, classify, and tag sensitive data such as personally identifiable information (PII). These tags enable data governance stakeholders to easily assess compliance and create no-code access policies that improve visibility and control over data. Okera also provides a self-service portal to quickly audit and analyze sensitive data usage, giving organizations the ability to reliably monitor and track data usage patterns. This helps ensure that governance policies are applied consistently, even in the explosion of data assets, many of which can be AI generated.
Second, Okera has been developing a new isolation technology that can support arbitrary workloads while enforcing governance control without sacrificing performance. This technology is in private preview and has been tested by a number of joint customers specifically on their AI workloads. It is the key to ensure enterprises will be covering the whole spectrum of applications in the new world efficiently. We will be sharing more technical details of this new technology soon.
Unity Catalog with Okera
The lakehouse is the best place to develop data and AI applications together, and to build LLMs. Our lakehouse vision is centered around the unification of these workloads on one platform. At the foundation of our lakehouse vision lies Unity Catalog, the data governance layer for all data and AI workloads. We intend to integrate Okera’s AI-centric governance technologies into Unity Catalog.
Our customers will benefit from being able to use AI to discover, classify and govern all their data, analytics, and AI assets (including ML models and model features) with attribute-based and intent-based access policies. Additionally, they will benefit from end-to-end data observability on the lakehouse that allows them to centrally audit and report sensitive data usage across analytics and AI applications, and automatically trace data lineage down to the column level.
With these enhancements, our customers will have a holistic view of their data estate across clouds and can use a single permission model to define access policies, accelerating AI use cases and ensuring consistent governance across the lakehouse. This forthcoming acquisition will also enable us to expose APIs for richer policies that other data governance partners can use, providing seamless solutions for our customers.
The Okera Team
We couldn’t have been more excited to welcome the Okera team, who are no strangers to Databricks. Nong Li, Okera’s co-founder and CEO, is widely known for creating Apache Parquet, the open source standard storage format that Databricks and the rest of the industry builds on. Nong also played an instrumental role at Databricks earlier on: he led the vectorized Parquet effort and the codegen effort that resulted in Apache Spark 2.0’s 10x performance improvement.
Behind Okera’s amazing technologies is the stellar team Nong has assembled. The moment we started talking with them, we knew the two companies would join forces and integrate very well.
“We founded Okera to help modern, data-driven enterprises accelerate legitimate data access while minimizing data security risks and delivering regulatory compliance. As data continues to grow in volume, velocity, and variety across different applications, CIOs, CDOs, and CEOs across the board have to balance those two often conflicting initiatives – not to mention that historically, managing access policies across multiple clouds has been painful and time-consuming. Many organizations don’t have enough technical talent to manage access policies at scale, especially with the explosion of LLMs. What they need is a modern, AI-centric governance solution. We could not be more excited to join the Databricks team and to bring our expertise in building secure, scalable and simple governance solutions for some of the world’s most forward-thinking enterprises.”
— Nong Li, Co-Founder and CEO of Okera
We’re thrilled to welcome Nong and the incredibly talented Okera team to Databricks. We look forward to incorporating Okera’s core capabilities directly into the Databricks platform in the coming year, further enhancing the unified, AI-centric governance experience delivered by Unity Catalog.
Stay tuned for more at the Data and AI Summit this June.