In addition to the explosion of data volumes, many organizations are struggling with an explosion in the number of data sources and data silos. Managing data in this fluid, ever-changing environment is a major challenge for would-be data-driven organizations, but one pattern that offers potential salvation for the stressed data architect is the data fabric.
Data fabrics aren’t new. We’ve been writing about them for several years here at Datanami. In the early days, the definition of a data fabric was a bit loose. But lately, it’s begun to harden and the core elements of a data fabric have coalesced into a configuration that’s finding traction in the real world.
Forrester analyst Noel Yuhanna was one of the early proponents of the data fabric. In the latest Forrester Wave: Enterprise Data Fabric, Q2 2022, Yuhanna dived into the benefits of the data fabric and dissected the offerings of 15 data fabric vendors.
“Today, delayed insights can have a devastating effect on a firm’s ability to win, serve, and retain customers,” Yuhanna wrote in the Wave report. “Organizations want real-time, consistent, connected, and trusted data to support their critical business operations and insights. However, new data sources, slow data movement between platforms, rigid data transformation workflows and governance rules, expanding data volume, and distributed data across clouds and on-premises, can cause organizations to fail when executing their data strategy.”
Centralizing all data in a data lake such as Hadoop or Amazon S3 was supposed to solve many of these problems, but it hasn’t worked out that way. Not every piece of data belongs in lakes, thanks to bandwidth and storage costs as well as sheer practicality. Technological progress also continues to churn out new digital innovations, and people are more than happy to try them out, which typically results in yet another data silo.
Data silos appear to be permanent houseguests. Just as Edwin Hubble’s raisin pudding analogy held that the expansion of the universe makes matter grow farther apart, the big data boom seems to be causing data repositories to drift further apart even as the overall volume of data continues expanding at a geometric rate. The data fabric is a way to layer some connective tissue among those sweet, sweet nuggets of data.
As Yuhanna wrote:
“Data fabric delivers a unified, integrated, and intelligent end-to-end data platform to support new and emerging use cases,” he continued. “It automates all data management functions–including ingestion, transformation, orchestration, governance, security, preparation, quality, and curation–enabling insights and analytics to accelerate use cases quickly.”
Data fabrics are essentially pre-integrated super-suites of data management tools. Instead of cobbling together separate products for handling the data functions that Yuhanna mentioned above (not to mention data catalogs), data fabrics deliver these functions through a single product, providing consistency and repeatability to big data management processes, which helps breeds trust in data and the analytics that come from it.
Yuhanna sees a lot of data fabrics being deployed in cloud and hybrid cloud environments at the moment, particularly in support of applications like customer 360, business 360, fraud detection, IoT analytics, and real-time insights. Data fabrics are being deployed across multiple industries, including financial services, retail, healthcare, manufacturing, oil and gas, and energy, he wrote.
Data fabrics are also being deployed in the life sciences industry, where they can help knit disparate data silos into a seamless whole. One life sciences company that’s betting big on data fabrics is eClinical Solutions, a Massachusetts-based provider of software for running clinical trials.
“But now with research we end up for every trial, you might be having 15+ different sources, different streams of data, different structures, different formats, different systems,” Indupuri said. “So the problem in terms of data chaos–we refer to this as data chaos–has only exploded or increased.”
In Indupuri’s view, the data fabric is a natural evolution of the data lake, or the lakehouse. These flexible data repositories are able to ingest and store just about any type of data, giving customers or stakeholders the ability to transform, prepare, and analyze the data when they need to. But when data spans multiple data lakes (or warehouses or lakehouses), that is where data fabrics play an important role.
“One big difference would be, instead of having everything in one centralized location, with the data fabric, that is how do you actually combine different stores,” he told Datanami in a recent interview. “They could be distributed. But on top we have a fabric so that with governance and with other capabilities, we’re able to deliver analytics to end stakeholders efficiently, to deliver it to downstream to different stakeholders in different systems.”
eClinical Solutions has already build some components of a data fabric solution into its offering. It has built an end-to-end data pipeline in AWS that automatically extracts metadata and catalogs it when a new piece of data lands in the system, according to Indupuri. The company’s solution also includes a data management workbench where data managers can review and clean data.
“We evolved significantly over a decade or so,” he said. “When we first started, it was kind of a report. Then we evolved into a data lake kind of an arch cure, where you can stage any data, regardless of the source. Then we have embedded capabilities where it’s metadata driven, you can actually transform and publish data marts within our data cloud.”
Where it gets tricky is dealing with the data repositories of eClinical Solutions’ own customers, who are drug companies or companies doing drug exploration. These customers often have separate data lakes for clinical research, for operational data, for safety data, and for regulatory data, and are loathe to move or copy data between them.
“You can actually enable them to access data across these data stores, or these distributed data clouds or data lakes or data warehouse,” Indupuri said. “So that’s where data fabric can help.”