Data Mesh: Advantages and Challenges
by Jan Hendrik Fleury on Dec 22, 2021 1:08:02 PM
When do you need a data mesh architecture framework, what does it bring and what does it take to adopt?
The first sentence in the blog should be: Data mesh is hot in the world of data platforms. Understandable, because data mesh offers the solution to a long-standing problem: the scalability of data platforms in an organizational way. Data mesh is hot in the world of data platforms. Understandable, because data mesh offers the solution to a long-standing problem: the scalability of data platforms in both technical and organizational ways. So data mesh may also provide a breakthrough within your organization. In this blog, I will share with you when you might need a data mesh framework and what the main advantages and challenges are.
Data mesh is an architecture in which distributed data products are developed and managed by data engineers and data product owners in domain teams.
Data mesh uses a shared infrastructure to host, prepare and offer data. Because central data teams en masse recognize that they are running into limits, data mesh has become an important trend in the world of data platforms. How could this situation have arisen?
Firstly what are the main features of a data platform or data lakehouse, or data fabric:
- It is a scalable cloud service where storage and computing power are separated;
- Where it is possible to work directly and interactively with data;
- All architectural components and their interaction should support native data types;
- Offering a toolset for both Analytics and AI;
- Unified data management.
Don’t get me wrong, unified data platforms offer a great deal of value and they are facilitating data- and digital transformation big time. At Crystalloids, we are specialized in designing and developing and not forget monetizing these platforms for all types of use cases.
In some cases, such as complex or international companies, functional scalability challenges arise.
The functional challenge
One of our clients is using a metaphor that makes this clear understandably:
Imagine you go to the library because you want to write a report on a subject. You need a couple of books on different topics to combine for your analysis. You run into some issues:
- At the moment, not all books are available right away;
- There is no catalog, so you don’t know where to look;
- It’s not always clear who wrote the book, if it’s still up to date, and what definitions were used in it;
- If you want a book that is new you need to ask a librarian and it can take weeks or months;
- Some books contain private information, but you cannot access them even though you need some part of the book;
In this example, you might want these functional and technical elements:
- All the new books are added immediately to the library once they become available;
- There is a catalog which specifies where you need to look for what kind of book;
- You have access to all the books, so it’s easy to combine books on different subjects;
- Each book has a clear author and the definitions align across all books;
- There are librarians around who can help you ask the right question, find the right books, and even help you write your analyses;
- There’s a separate section that contains books with private information. A sign tells you how to ask for access to this section.
Translating the metaphor into data ownership which is an integral part of the framework in data mesh:Efficiency
- Easier to find the owner of the data set in case of potential changes/issues;
- Ability to involve only relevant stakeholders;
- Find things faster and be able to look back.
- Clarity on where the data is coming from
- Transparency as to decision making on datasets themselves
- Less time on documentation in the future
The main differences between a data lakehouse and a data mesh, as I have published before in more detail are:
A data fabric and a data mesh both provide an architecture to access data across multiple technologies and platforms, but a data fabric is technology-centric, while a data mesh focuses on organizational change.
Organizational problem not solved yet
The organizational problem for knowledge and staffing has not yet been solved with a data lakehouse. With a larger data platform, a larger central team is still needed with centrally collected knowledge of data engineering: scale-up. This is why IT environments at enterprises often create vertical splits, with data engineers and data analysts working in different teams. The disadvantage of this split is that different teams are needed for each data product.
The full solution scalability problem resolved
The major advantage of data mesh is that it provides a full-scale solution: splitting up the central data team and the surrounding knowledge into domain teams, each with their expertise. This enables domain teams to deliver optimal business value within their own areas of expertise. Domain teams know better than anyone else the definitions of products or customers and they can also shape these entities. With the right standards, tools, and knowledge, domain teams are able to supply data products themselves and offer them centrally. In summary, this comes down to the following:
- The domain team manages the data quality and can monitor and improve it well;
- The domain team knows the right definitions and can apply and share them well;
- The domain team knows the data users and can serve them well and unburden them.
In many organizations, establishing a “single source of truth” or “authoritative data source” is challenging due to the repeated extraction and transformation of data across the organization without clear ownership responsibilities over the newly created data. In the data mesh, the authoritative data source is the Data Product published by the source domain, with a clearly assigned Data Owner and Steward who is responsible for that data.
By contrast, in turn, data mesh also poses challenges. Essential questions that every organization should have answered before any data mesh implementation include:
- How (de)centralized is my organization set up?
- What is the size of my organization?
Implementation of data mesh only makes sense if the benefits of decentralization outweigh the investment in setting up the platform and standards. That is why data mesh is a suitable solution for (especially) organizations with multiple divisions and/or an international character.
The new role of IT teams
Data mesh also requires a new role for the IT teams, both supporting and controlling. The IT teams must support the domain teams with the platform and the right tools. In addition, they must audit the domain teams by overseeing the application of uniform standards.
With multiple domain teams each delivering their own data products, good support is necessary for the following areas: standards for accessible description of data products, support for modern tooling, and understandable data transformation standards.
You probably already asked yourself the question: how to keep control in an environment with multiple independent teams? The answer: standardization and policy. By drawing up standards, it can be guaranteed that no proliferation of code and descriptions occurs. When managing domain teams, a well-defined policy is needed: it should not be possible to release code or documentation that does not meet standards regarding naming, structure, and tagging.
Enhance data literacy in the business domains
In order to take full advantage of the data platform, we need an organization with data fluent people.
Gartner defines data literacy as “the ability to read, write and communicate data in context, including an understanding of data sources and constructs, analytical methods and techniques applied, and the ability to describe the use case, application and resulting value.”
Employees who are data fluent can:
- do their critical thinking and analyses using data;
- make data-supported decisions rather than experience or intuition-based ones;
- use data to communicate ideas and to help create new products, business models, workflows, and strategies;
- understand data visualizations
Without data fluency, data in an organization doesn’t add much value. The business domains need to be supported, trained, and coached. In a future blog, I will dive into the organizational part of the data mesh framework.
Data mesh is not a cloud service that you just switch on or off. It is a combination of a good approach with the right tools.
You can simultaneously use a data mesh and a data fabric, and even a data hub. First, they are concepts, not things. A data hub as an architectural concept is different from a data hub as a database. Second, they are components, not alternatives. It is practical for architecture to include both data fabric and data mesh. They are not mutually exclusive. Finally, they are architectural frameworks, not architectures. You don’t have architecture until the frameworks are adapted and customized to your needs, your data, your processes, and your terminology.
Both data meshes and data fabrics have a seat at the data table. In the search for architectural concepts and architectures to support data projects, it all comes down to finding what works best for your own specific needs. Crystalloids is ready to guide you.
Crystalloids helps companies improve their customer experiences and build marketing technology. Founded in 2006 in the Netherlands, Crystalloids builds crystal-clear solutions that turn customer data into information and knowledge into wisdom. As a leading Google Cloud Partner, Crystalloids combines experience in software development, data science, and marketing, making them one of a kind IT company. Using the Agile approach Crystalloids ensures that use cases show immediate value to their clients and frees their time to focus on decision making and less on programming
No Comments Yet
Let us know what you think