I recently caught up with Alexey Utkin, Sentior VP Capital Markets at DataArt to discuss data mesh architecture that is rising in importance with enterprises today and whether the technology will replace data warehouses, lakes, and other architectures.
insideBIGDATA: Will it replace data warehouses, lakes, and other architectures?
Alexey Utkin: First of all, data mesh, despite being cool and relatively new kid on a block, is not for everyone. It is well possible, that based on your scale and ambition, a cloud data warehouse, data lake, lakehouse or other architecture is an appropriate choice for your organization today or tomorrow. Data mesh paradigm aims to address several shortcomings of these centralized architectures and associated implementation and operational approaches, which lead to a lack of scalability and agility in implementing a growing number of analytical use cases using growing variety of data sources. For some organizations these limitations are theoretical or may appear in a distant future.
Yet, it still worth considering some of the data mesh principles and underlying drivers for your organization, especially business domain orientation and domain ownership of data and data-as-a-product, even if you are not going for a full data mesh platform today. You may find your organization getting far more value from higher quality, more easily discoverable and accessible data products sooner rather than later. While you select to align with such data mesh principles which bring value today, like the way you organize data product teams and data ownership, your data infrastructure may take a form of a cloud data warehouse or a lakehouse while it is not limiting.
insideBIGDATA: If not, how will it complement those existing architectures?
Alexey Utkin: In my view there are two possible way of how the existing data architectures can mix with data mesh concept.
First, most organizations do not start with a clean sheet. They already have existing data platform or a number of those; mostly in a shape of data warehouses, data lakes or lakehouses. Data mesh, not being a particular technology or infrastructure product, requires underlying infrastructure and platform capabilities. And companies embarking on a data mesh journey often choose to initially keep existing data infrastructure, and over time, extend and evolve it towards the data mesh capabilities.
Second, data mesh concept advocated for data product and pipeline ownership within the teams organized around business domains. These domain data products require infrastructure to store, process and serve data. So existing data architectures, like data warehouses and lakes, may become infrastructure for specific domain data products, where appropriate. In other works, they may become nodes on the data mesh.
insideBIGDATA: What are some benefits of implementing a data mesh architecture?
Alexey Utkin: Some of the key benefits of the data mesh concept are decentralization and domain orientation. Data mesh aligns with the fundamentally decentralized and ubiquitous nature of data and removed the common friction, associated with the centralized data teams, which sooner or later become a bottleneck on a way towards a data-driven organization. Data mesh aligns data ecosystem with the way organizations are structured, i.e. around business domains. This ensures that people who take care of domain data and build domain data products actually understand business domain, understand where the data comes from, what it means, who and how consumes it; these domain data people are better able to connect data from source operational systems and analytical needs of the users. This leads to more valuable data products and uses of data for an organization. Further, Data-as-a-Product concept pushes often struggling organizational efforts to govern data, ensure quality, make data easily discoverable, understandable and consumable to a new level, by shifting the ownership of these matters to the business domains.
Some of the other key benefits of data mesh architecture are related to the self-service platform capabilities. While data mesh advocates for turning organizational data landscape into an ecosystem of domain-oriented data products, built autonomously by domain data teams. This vision needs to be supported by foundational infrastructure capabilities to remove friction associated with building and evolving the domain data products. These platform capabilities also need to ensure that data is discoverable, controllable, interoperable and consumable on a global level.
The list of such capabilities is not unique for the data mesh and is a part any data architecture I would call modern, including data warehousing, data lakes and lakehouses. For example, it includes polyglot data storage, data pipeline implementation and orchestration, data product discovery, access control, data cataloguing and lineage, monitoring and alerting, data quality management. Yet, in data mesh, the emphasis is on providing these capabilities to the domain data product teams from a self-service platform level, rather than having them to spend time and effort themselves.
insideBIGDATA: What are the challenges these firms need to keep in mind when approaching or building the data mesh?
Alexey Utkin: There are few categories of challenges organizations face on their journeys towards data mesh.
First, it is still relatively new concept. Leading companies have started experimenting with data mesh over the last couple of years. There still a great deal of discussion on how some of the principles better applied in practice and not a lot of solid experience because of the novelty. Organizations often use the data mesh concepts as a direction where their technology should evolve, but it is a journey and it takes time.
Another category of challenges are related to changes in organization, roles, team setup and skills. Getting away from centralized team and centralized architecture, re-orienting data teams around domain data products and data platform, embracing product thinking and data-as-a-product approach, breaking historical technology specializations and learning new skills and tools – all of that takes time and organizational will.
And a third category is the technology. Data technologies have been evolving at an accelerated pace ever since we started hearing words ‘big data’ in the early 2000s. Advancements of the open source and cloud data technologies have driven a shift towards modern data platforms over the last several years. Yet, data technology landscape is very diverse. Most of the data mesh platform capabilities do exist, but as a separate technologies, separate building blocks; and some of those are not very mature. There are industry initiatives to standardize certain data capabilities, like standard metadata format and exchange interfaces, but those are still not commonplace. Today it takes skill, effort and quite a bit of thinking to integrate those into a compete data mesh platform, while your teams need to commit to learning and re-skilling. I believe that over time, with more and more organizations embracing data mesh, all these technologies, standards and approaches will mature and we will see data mesh going from early adoption stage to a mainstream.
insideBIGDATA: In the end, what does it help firms do?
Alexey Utkin: Data mesh fundamentally helps to scale an ability of an organization to work with rapidly growing number of data sources, ability to quickly implement new diverse use cases in data & analytics, ability to support ever growing number of consumers. With data-as-a-product approach, data mesh helps to increase quality, accessibility, interoperability and usability, and thus and value of data for internal external consumers; enable data-driven organization.
Sign up for the free insideBIGDATA newsletter.
Join us on Twitter: @InsideBigData1 – https://twitter.com/InsideBigData1