Are we in for a future of robot-designed products, or will a human-AI partnership help companies design better products faster?

 

Data mesh, data architecture, data platforms, data pipelines, data frameworks

Data is central to most businesses’ operations, but data architectures have not been able to keep up with the demands of today’s non-stop world. Thus, innovators are considering new ways to structure their data storage, processing, security, and delivery mechanisms to support growth, scalability, and effectiveness throughout their organization. In this post, we’ll answer the four most common questions about data mesh – an emerging data architecture that promises to solve some of these challenges.

What Is Data Mesh?

Data mesh is a fairly new concept. Zhamak Dehgani – Director of Emerging Technology at ThoughtWorks and author of the book Data Mesh – describes it as a response to the very common problem of centralized, monolithic, slow-moving data platforms. A data mesh is a technology-agnostic distributed data architecture, where data resides with the team that produces (or is responsible) for it; relevant data is then shared throughout the company. It’s also been described as a set of principles for designing data architecture, comparable to how microservices are used in software architecture.

There are four key principles of data mesh:

  1. Data is owned by a specific domain, but access to that data is decentralized. This entails each business area identifying the data that would be useful to other areas in the organization (e.g. customer service teams sharing common problem information with product development teams) and making that data accessible.
  2. Data is treated as a product and is managed and shared by the domain team. Just as someone developing a product would consider how it would be used, teams need to think about how other departments and users might access and consume their data.
  3. Data is available in a self-service model throughout the company (with, of course, due regard to security and governance). Although data may reside within a domain, department, or team, others can use it to create reports, etc. as their role requires.
  4. Data is governed where it resides. Dehgani refers to this as ‘federated computational governance’, a new way of governing data that enables interoperability across domains – despite the differences between them. Governance at a decentralized domain microservice level allows users to trust and quickly navigate data in the mesh.

How Is a Data Mesh Different?

Unlike centralized data architectures, a data mesh allows the teams that capture or generate data to create usable data products for other teams. Meanwhile, the data platform team can focus on data engineering and leave the domain-specific data problems to the data professionals embedded in each business team. The business data professionals can get help from the platform team regarding technical issues, but they’re in charge of the quality and reliability of their team’s data. Finally, data mesh is designed to be more available to business users, which requires less input from the platform team. (Contrast this to many centralized data teams, who are responsible for data frameworks and access as well as helping business teams with their data requests.)

In short, the decentralized architecture and data team allows each party to deliver what it does best: the platform team focuses on technology, engineering, and data pipelines; the embedded business data professional manages data quality for their team, and the end users can perform their data-driven tasks without waiting for the results of a custom request.

What Problems Does a Data Mesh Solve?

  • Incorporating a data-driven mindset across teams and in leadership. This is not a problem, but it can be a challenge!
  • Problems accessing or agreeing on critical data.
  • A lack of ownership (or quality) in your organization’s data.
  • Difficulty accessing data across teams (i.e. data bottlenecks).
  • Data pipelines are slow, inefficient, or broken.
  • Your current data ecosystem is messy or hard to scale.
  • More data sources are becoming available.
  • Teams are growing in size and/or function and more products are relying on data-driven features.
  • Cross-collaborative teams struggle to communicate and support each side’s functions.
  • Data teams are hampered by a lack of business or domain knowledge.
  • Your organization has a lot of data specific to various regions, business areas, etc.

How Do You Implement a Data Mesh?

The process starts with a shift in thinking: from centralized data ownership to team-based data ownership; from data as a secondary effect to data as a valuable product within the organization; from a single data team managing a huge, centralized data repository to domain teams that include data professionals.

To build the data mesh itself, you need to start by ensuring your foundation includes a central data streaming platform that will store data, allow for autonomous publishing by teams, and then deliver on demand to users. Next, make sure all data sources have clear owners (i.e. the team that originates or curates the data); they will be responsible for publishing this data on the mesh.

Security-wise, it’s important to secure event streams and provide for data governance at the team level. In terms of access and performance, your platform should be able to support dynamic schema changes (as teams work with their data), connectors to both owner and user databases, data stream previews, data lineage views, and data stream access requests. Finally, an intuitive easy-to-use interface for business users will create a unified experience and solve the last-mile problem of getting data into the users’ hands.

The Case for Data Mesh

By allowing organizations to decentralize their data, data meshes make it easier to scale and adapt to new uses. Each business team – as well as the data platform team – can devote more time to their area of expertise. Because each team manages its data independently, there are opportunities for more detailed governance, faster data access and delivery, and better compliance with local or domain-specific regulations. In short, a data mesh allows for businesses to utilize their data in a flexible and efficient way, which can power an increase in data-driven decisions and a more agile outlook.

Authored by: Dr. Anil Kaul, CEO of Absolutdata and Chief AI Officer at Infogain and  Harshit Parikh, Vice President, Global Practice lead at Infogain

Subscribe

Related Absolutdata products and services: Analytics, Data Science & AIData IntegrationData ModernizationHyperautomation