Big Data News Hubb
Advertisement
  • Home
  • Big Data
  • News
  • Contact us
No Result
View All Result
  • Home
  • Big Data
  • News
  • Contact us
No Result
View All Result
Big Data News Hubb
No Result
View All Result
Home Big Data

Choose Both: Data Fabric and Data Lakehouse

admin by admin
January 16, 2023
in Big Data


Posted in Business |
September 12, 2022 5 min read

A key part of business is the drive for continual improvement, to always do better. “Better” can mean different things to different organizations. It could be about offering better products, better services, or the same product or service for a better price or any number of things.  Fundamentally, to be “better” requires ongoing analysis of the current state and comparison to the previous or next one. It sounds straightforward: you just need data and the means to analyze it. Right?

Yes and no. The data is there, in spades. Data volumes have been growing for years and are predicted to reach 175 ZB by 2025. Yet there are two things blocking success. First, organizations have a tough time getting their arms around their data. More data is generated in ever wider varieties and in ever more locations. What previously was nicely defined and structured data in a few fully owned and controlled places, like a data center, is now churning torrents of data of all shapes and sizes spread across edge and cloud environments. Organizations don’t know what they have anymore and so can’t fully capitalize on it—the majority of data generated goes unused in decision making. And second, for the data that is used, 80% is semi- or unstructured. Combining and analyzing both structured and unstructured data is a whole new challenge to come to grips with, let alone doing so across different infrastructures. Both obstacles can be overcome using modern data architectures, specifically data fabric and data lakehouse. Each is powerful in their own right, but used together they drive synergies that create more options to be “better.”

Unified data fabric

For many organizations, a data fabric is a first step to becoming more data driven. A data fabric answers perhaps the biggest question of all: what data do we have to work with? Managing and making individual data sources available through traditional enterprise data integration, and when end users request them, simply does not scale—especially in light of a growing number of sources and volume. The tremendous overhead placed on IT hampers the speed with which organizations can bring together ever more data to deploy new use cases. What’s more, data users are forever plagued by the feeling that more data, perhaps better data, is out there somewhere, which causes teams to second-guess results or resort to the use of unsanctioned sources, which creates compliance risks.

A data fabric flips the traditional “as needed” enterprise data integration approach, with data fabric teams able to integrate all data sources in a fully controlled way, understand them, and make them available via self-service.

With solid data management across the whole process, a data fabric ingests any and all data sources regardless of variety or velocity. The data sources can then be processed and stored as well as integrated and cleaned to uncover what they represent and makes the data sources available to users, where needed, in a safe and compliant manner.

It won’t surprise you that all of Cloudera Data Platform’s (CDP) capabilities come to bear when companies deploy a data fabric architecture; our customers have been creating data fabrics before it was even named. Where CDP really shines, and what makes for a truly unified data fabric, is via the Shared Data Experience (SDX). SDX provides a comprehensive approach to data security and governance with powerful fine-grained access control triggered by data classifications uncovered through automated data discovery. This makes it possible to open up data access to more users, even for previously unknown data sources. And it does so—here’s the kicker!—not just in one infrastructure but across all infrastructures: hybrid and multi-cloud. Consistent data security and governance across all fabrics. Through a single pane of glass, SDX’s Data Catalog provides self-service data access to end users, letting them find the data they need, appreciate the context, and give them the confidence they’ve found all the data they need.

Open data lakehouse

Once you have the access to all the data you need at the right time, the next step is to be able to use the data efficiently, opening the door for new analytic use cases. This is where the data lakehouse comes in. More and more organizations are realizing that it is the most efficient and performant architecture for running multi-function analytics because it makes all their data more usable and effective. Companies need answers to more complex business questions that require integration of unstructured data, real time data with use of modern, best-of-breed engines for analytics, stream processing, and for AI and ML for predictive analytics. These answers must be reliable and delivered quickly. If data has to be transformed to proprietary formats and moved around for each of the compute engines you want to use, it would result in data silos, stale data, and delayed insights. A data lakehouse that enables multiple engines to run on the same data improves speed to market and productivity of users. 

Cloudera has supported data lakehouses for over five years. We have delivered the performance and reliability of the data warehouse with the flexibility and scale of a data lake with our data service engines and the Hive metastore. With the integration of Apache Iceberg—an open standard, open source based table format in SDX—Cloudera is taking the data lakehouse to the next level by creating an open data lakehouse. Applying the Iceberg table format to all the organization’s data in the data lake makes it more performant and usable at scale. An open data lakehouse, powered by Iceberg, makes the organization’s data agnostic to processing engines, providing greater flexibility and choice. It simplifies data management at scale and adds superpowers like time travel, snapshot isolation, and partition evolution to the traditional data lakehouse. 

Better together

Organizations need the two data architectures working together in harmony to drive value and insight from ever more data, faster. A data fabric combined with a data lakehouse is the ideal foundation for most organizations. This combo allows companies to orchestrate their data and optimize getting value and insight from it. However, both architectures must be deployed based on the same platform and support hybrid cloud for organizations to achieve maximum value from their investment. That’s what companies get with CDP’s unified data fabric powered by SDX, an open data lakehouse made possible by integration with Apache Iceberg. Cloudera Data Platform is a single hybrid platform for modern data architectures with data anywhere.

For example, a multinational health information technology and clinical research organization realized the challenges they themselves experienced were shared by their customers. They not only combined and deployed both architectures for their own use, but also made them an integral part of the products they provide. Both the organization as well as their customers can now unlock data sources in a safe and compliant manner, as well as drive insight faster from both structured and unstructured data. Their healthcare PaaS effectively combines both data fabric and data lakehouse capabilities, leading to higher productivity for research and development teams while also ensuring HIPAA and PII compliance. What’s more, both the organization and their customers benefit from lower TCO for service delivery.

This is the value companies get with CDP’s unified data fabric powered by SDX and an open data lakehouse made possible by integration with Apache Iceberg. Cloudera Data Platform is a single hybrid platform for modern data architectures with data anywhere.

To find out more on how CDP unleashes the potential of your data with modern data architectures, check out Cloudera Now.



Source link

Previous Post

Got It AI Develops AI to Identify and Address ChatGPT Hallucinations for Enterprise Applications

Next Post

Forecasting Potential Misuses of Language Models for Disinformation Campaigns—and How to Reduce Risk

Next Post

Forecasting Potential Misuses of Language Models for Disinformation Campaigns—and How to Reduce Risk

Recommended

The Strength of America’s Data Will Determine the Impact of the CHIPS and Science Act

February 23, 2023

Simplify Streaming Infrastructure With Enhanced Fan-Out Support for Kinesis Data Streams in Structured Streaming

January 31, 2023

Harness Unstructured Data with AI to Improve Investigative Intelligence 

March 9, 2023

Don't miss it

News

Saving Sea Turtles with SAS’s ConserVision App

June 5, 2023
News

8 Tips on Using Data for Ecommerce

June 4, 2023
Big Data

Video Highlights: Fine Tune GPT-J 6B in Under 3 Hours on IPUs

June 4, 2023
Big Data

Adaptive Query Execution in Structured Streaming

June 4, 2023
News

Migrate from Google BigQuery to Amazon Redshift using AWS Glue and Custom Auto Loader Framework

June 4, 2023
News

Meet Barr Moses, a 2023 Datanami Person to Watch

June 4, 2023
big-data-footer-white

© Big Data News Hubb All rights reserved.

Use of these names, logos, and brands does not imply endorsement unless specified. By using this site, you agree to the Privacy Policy and Terms & Conditions.

Navigate Site

  • Home
  • Big Data
  • News
  • Contact us

Newsletter Sign Up

No Result
View All Result
  • Home
  • Big Data
  • News
  • Contact us

© 2022 Big Data News Hubb All rights reserved.