Big Data News Hubb
Advertisement
  • Home
  • Big Data
  • News
  • Contact us
No Result
View All Result
  • Home
  • Big Data
  • News
  • Contact us
No Result
View All Result
Big Data News Hubb
No Result
View All Result
Home Big Data

Announcing General Availability of Data lineage in Unity Catalog

admin by admin
December 12, 2022
in Big Data


Today, we are excited to announce the general availability of data lineage in Unity Catalog, available on AWS and Azure. With data lineage general availability, you can expect the highest level of stability, support, and enterprise readiness from Databricks for mission-critical workloads on the Databricks Lakehouse Platform. Refer the data lineage guides (AWS | Azure) to get started.

In this blog, we explore how organizations leverage data lineage as a key lever of a pragmatic data governance strategy, some of the key features available in the GA release, and how to get started with data lineage in Unity Catalog.

Driving better data observability and compliance with data lineage

Unity Catalog provides a unified governance solution for data, analytics and AI, empowering data teams to catalog all their data and AI assets, define fine-grained access permissions using a familiar interface based on ANSI SQL, audit data access and share data across clouds, regions and data platforms.

With automated data lineage in Unity Catalog, data teams can now automatically track sensitive data for compliance requirements and audit reporting, ensure data quality across all workloads, perform impact analysis or change management of any data changes across the lakehouse and conduct root cause analysis of any errors in their data pipelines.

“Data Lineage has enabled us to get insights into how our datasets are used and by whom. This serves as both basic documentation as well as identifies who would be affected by dataset changes or deprecations to cut down on incidents”

— Sam Shuster, Staff Engineer, Edmunds

“Lineage is the last crucial piece for access control. It allows analysts to leverage data to do their jobs while adhering to all usage standards and access controls, even when recreating tables and data sets in another environment”

— Chris Locklin, Data Platform Manager, Grammarly

“Lineage helps Milliman professionals see where data is coming from, what transformations did it go through and how it is being used for the life of the project. This well-documented end-to-end process complements the standard actuarial process”

— Dan McCurley, Cloud Solutions Architect, Milliman

Key Features of data lineage available in the GA release

Automated real-time lineage: Unity Catalog automatically captures and displays data flow diagrams for queries executed in any language (Python, SQL, R, and Scala) and execution mode (batch and streaming). Real-time lineage reduces the operational overhead of manually creating data flow trails. Data lineage is automatically aggregated across all workspaces connected to a Unity Catalog metastore, this means that lineage captured in one workspace can be seen in any other workspace that shares the same metastore.

Unified column and table lineage graph: With Unity Catalog, users can now see both column and table lineage in a single lineage graph, giving users a better understanding of what a particular table or column is made up of and where the data is coming from. Users can navigate the lineage graph upstream or downstream with a few clicks to see the full data flow diagram.

Table and column lineage in Unity Catalog

Going beyond just tables and columns: Unity Catalog also tracks lineage for notebooks, workflows, and dashboards. This improves end-to-end visibility into how data is used in your organization and allows you to understand the impact of any data changes on downstream consumers.

Lineage for notebooks, workflows and dashboards
Lineage for notebooks, workflows and dashboards

Built-in security: Lineage graphs are secure by default and use the Unity Catalog’s common permission model. Users must have the appropriate permissions to view the lineage data flow diagram, adding an extra layer of security and reducing the risk of unintentional data breaches. For example, if users do not have the SELECT privilege on a table, they will be unable to explore the table’s lineage. Similarly, users can only see lineage information for notebooks, workflows, and dashboards that they have permission to view.

Built-in security for lineage graphs
Built-in security for lineage graphs

Partner integrations: Unity Catalog also offers rich integration with various data governance partners via Unity Catalog REST APIs, enabling easy export of lineage information.

Getting started with data lineage in Unity Catalog

Watch the demo below to see data lineage in action.

Data lineage is included at no extra cost with Databricks Premium and Enterprise tiers. All workloads referencing the Unity Catalog metastore now have data lineage enabled by default, and all workloads reading or writing to Unity Catalog will automatically capture lineage. To take advantage of automatically captured Data Lineage, please restart any clusters or SQL Warehouses that were started prior to December 7th, 2022. If you already have a Databricks account, you can get started by following the data lineage guides (AWS | Azure). If you are not an existing Databricks customer, sign up for a free trial with a Premium or Enterprise workspace.



Source link

Previous Post

Migrate a large data warehouse from Greenplum to Amazon Redshift using AWS SCT – Part 3

Next Post

CATALOG Achieves Historic DNA Computing Milestone

Next Post

CATALOG Achieves Historic DNA Computing Milestone

Recommended

Amazon Identity Services uses Amazon QuickSight to empower partners with self-serve data discovery

January 3, 2023

Using Kafka Connect Securely in the Cloudera Data Platform

October 19, 2022

Our Approach to Alignment Research

January 21, 2023

Don't miss it

News

Stormy Skies Ahead? Report Finds 20% of Businesses Intend to Move Workloads From Cloud to On-Prem

February 5, 2023
Big Data

An Introduction to Disaster Recovery with the Cloudera Data Platform

February 4, 2023
Big Data

Comet Announces Convergence 2023, the Leading Conference to Explore the New Frontiers of Machine Learning

February 4, 2023
Big Data

Design Patterns for Batch Processing in Financial Services

February 4, 2023
News

AWS Lake Formation 2022 year in review

February 4, 2023
News

Data Mesh Creator Takes Next Data Step

February 4, 2023

big-data-footer-white

© 2022 Big Data News Hubb All rights reserved.

Use of these names, logos, and brands does not imply endorsement unless specified. By using this site, you agree to the Privacy Policy and Terms & Conditions.

Navigate Site

  • Home
  • Big Data
  • News
  • Contact us

Newsletter Sign Up

No Result
View All Result
  • Home
  • Big Data
  • News
  • Contact us

© 2022 Big Data News Hubb All rights reserved.