Data Mesh: When To Adopt, What It Offers, And How To Implement It
by Jan Hendrik Fleury & Veronika Schipper on Dec 22, 2021 1:08:02 PM
Data mesh is hot in the world of data platforms. It's a big deal because it helps solve an old problem: making big data systems grow smoothly with your organization. This means data mesh could really help your team do better.
In this post, I'll talk about when your organization might need a data mesh architecture. You'll learn about the benefits of data mesh and what it brings to the table. I'll also touch on what it takes to start using it. It's all about understanding the good stuff and the challenges, so you can make smart choices about your data architecture. Stick around to see if principles of data mesh are right for you!
What Is Data Mesh?
Data mesh is an approach to data architecture where distributed data products are created and managed by skilled data engineers and dedicated data product owners within domain-specific teams.
This system relies on a shared data infrastructure to host, prepare, and provide access to existing data. As centralized data teams often find themselves hitting limits, data mesh has emerged as a significant trend in the data platform world. But how did we get here?
To understand this, let's look at the main features of a data platform, or a data lake:
- It's a scalable cloud service with separate storage and computing power.
- It allows direct and interactive work with data.
- All architectural components support native data and their interactions.
- It provides tools for both Analytics and AI.
- It includes unified data management for effective data governance.
Unified central data platforms, like a data lake, are incredibly valuable. They drive data and digital transformation significantly. Rituals, for example, uses data platform to improve data access and analysis capabilities to improve their business performance.
However, in complex or international companies, functional scalability challenges can arise. This is where the benefits of data mesh become particularly relevant, offering a more flexible and decentralized approach to managing and governing data.
lex or international companies, functional scalability challenges arise.
Why Use Data Mesh?
In certain scenarios, particularly within complex or international organizations, data platforms, including data warehouses, confront scalability challenges. Picture this as visiting a library to collect books for research, but facing issues like unavailable books, no catalog, uncertain authorship, and difficulties in accessing specific information.
Data mesh provides solutions to these functional and technical challenges in data management:
- Newly acquired books (new data) are promptly added to the system.
- A well-organized catalog helps guide you to the right books (data pipeline), ensuring efficient retrieval.
- Access to all books (data) simplifies the combination and analysis of information.
- Clear authorship and standardized definitions enhance the quality and understanding of data.
- Expert librarians (data management professionals) are available to assist in navigating through complex data landscapes.
- A separate section for private information (sensitive data) with clear access instructions, ensuring data governance and security.
This approach not only streamlines the process of accessing and using data but also transforms the data warehouse into an efficient, analytical data platform, making it easier for users to derive valuable insights.
Differences Between Data Lakehouse and Data Mesh
The organizational problem for knowledge and staffing has not yet been solved with a data lakehouse. With a larger data platform, a larger central data team is still needed with centrally collected knowledge of data engineering: scale-up.
This is why IT environments at enterprises often create vertical splits, with data engineers and data analysts working in different teams. The disadvantage of this split is that different teams are needed for each data product.
Data ownership is a crucial aspect of implementing data mesh, which greatly affects how you manage data within an organization:
- It becomes easier to identify the owner of a domain data set if changes or issues arise, enhancing the speed and accuracy of managing data.
- Engaging only the relevant stakeholders for a particular data domain minimizes confusion and streamlines decision-making.
- Users can find things faster and have a clear historical trail to follow, which is particularly beneficial for data consumers who rely on accuracy and speed.
- Clear visibility into the origins of domain data ensures that all data consumers understand its source and context.
- Transparency in the decision-making process regarding datasets fosters trust and collaboration among different domains.
- With well-defined data ownership, less time is spent on documentation in the future, as the responsible parties are clear from the outset.
In contrast to a data lakehouse, which I've previously discussed in more detail, both a data fabric and a data mesh provide architectures to access data across various technologies and platforms. However, while a data fabric is technology-centric, a data mesh emphasizes organizational change, focusing on domain data and how to manage, maintain, and make it accessible for all data consumers effectively.
A data fabric and a data mesh both provide an architecture to access data across multiple technologies and platforms, but a data fabric is technology-centric, while a data mesh focuses on organizational change.
The Benefits of a Data Mesh
Implementing a data mesh can significantly transform how organizations handle their data, offering an array of advantages while also solving complex organizational problems:
1. Controlled by Many Teams:
When you implement a data mesh, various teams within the company manage their own multiple data products. This decentralization speeds up processes and makes problem-solving more efficient. Each domain team operates within its area of expertise, leading to more tailored and effective solutions.
Domain teams know better than anyone else the definitions of products or customers and they can also shape these entities. With the right standards, tools, and knowledge, domain teams are able to supply data products themselves and offer them centrally.
- The domain team manages the data quality and can monitor and improve it well;
- The domain team knows the right definitions and can apply and share them well;
- The domain team knows the data users and can serve them well and unburden them.
2. Data Treated as Important Product
In a data mesh approach, data is viewed as a valuable product needing regular maintenance and updates. In many organizations, establishing a “single source of truth” or “authoritative data source” is challenging due to the repeated extraction and transformation of data across the organization without clear ownership responsibilities over the newly created data.
In the data mesh, the authoritative data source is the Data Product published by the source domain, with a clearly assigned Data Owner and Steward who is responsible for that data.
3. Grows with Your Company
As your company expands, so does the data mesh, adapting to increased demands without the common slowdowns of a centralized data platform. This scalability is a significant advantage, allowing organizations to grow their data architecture in line with their overall growth.
4. Easy to Find and Use Data
Data Mesh makes it easier to find and use data by organizing and explaining it well. Domain teams manage the quality of multiple data products, ensuring they're easy to monitor, improve, and utilize. This improved accessibility is crucial for data consumers who rely on accurate and timely information.
5. Better Teamwork and New Ideas
Easy access to and understanding of data across domains lead to enhanced collaboration and innovation. Data scientists and domain experts, familiar with their data users, can effectively meet their needs and foster an environment where new ideas are encouraged and developed.
6. Quick to Change
Domain teams can swiftly make changes and updates relevant to their specific areas. This agility helps the entire company adapt quickly to new opportunities and challenges. They know the right definitions, can apply and share them effectively, and are well-equipped to manage and adjust their data products, including real-time data.
7. Rules and Safety
Implementing a data mesh ensures data is used safely and in compliance with regulations by clearly defining responsibilities. It addresses the challenge of establishing a "single source of truth" by designating authoritative data sources with assigned Data Owners and Stewards who are accountable for that data. This clarity helps prevent the issues that often arise with a centralized data platform.
8. Less Work for the Main IT Team
By empowering domain teams to manage their data, the central IT team's workload is reduced. This allows them to concentrate on enhancing the overall data infrastructure and capabilities, thus enabling data to be more effectively used across the organization.
The challenges of a Data Mesh
The challenges associated with adopting a data model are worth considering. It's essential to address critical questions before implementing a data mesh approach:
- How (de)centralized is my organization set up?
- What is the size of my organization?
Implementation of data mesh only makes sense if the benefits of decentralization outweigh the investment in setting up the platform and standards. That is why data mesh is a suitable solution for (especially) organizations with multiple divisions and/or an international character.
Data Need Assessment: Understand your organization's data needs thoroughly. Determine which domains require a more self-serve data platform and assess the extent to which decentralized data management is necessary.
Organizational Structure: Examine how (de)centralized your organization is structured. A successful data mesh implementation relies on alignment with your organization's existing structure and culture.
Organization Size: Consider the size of your organization. Data mesh is particularly suitable for larger organizations with multiple divisions and international operations. Smaller organizations may not benefit as significantly from this approach.
Operational and Analytical Data: Distinguish between operational and analytical data needs within your organization. Assess how data mesh can effectively cater to both types of data requirements.
The new role of IT teams
Enhance data literacy in the business domains
To fully harness the potential of a data platform, an organization requires individuals who are data-fluent. According to Gartner's definition, data literacy encompasses "the ability to read, write, and communicate data in context, including an understanding of data sources and constructs, analytical methods and techniques applied, and the ability to describe the use case, application, and resulting value."
Data-fluent employees possess the following capabilities:
Data-Driven Thinking: They can engage in critical thinking and analysis using data as a foundation, enabling them to draw informed conclusions.
Informed Decision-Making: Data-fluent individuals rely on data to make decisions, prioritizing data-backed insights over experiences or intuition.
Data-Enabled Innovation: They leverage data to communicate ideas effectively and contribute to the creation of new products, business models, workflows, and strategies.
Understanding Data Visualizations: Data-fluent employees are proficient in understanding and interpreting data visualizations, ensuring that insights are effectively communicated.
Without data fluency, data assets within an organization may not yield their full potential value. It becomes essential to support, train, and coach business domains to develop data fluency. In an upcoming blog post, we will delve deeper into the organizational aspects of the data mesh framework, exploring how to foster a data-fluent culture that maximizes the benefits of data assets.
Building a data mesh is not just a cloud service that you switch on or off. It is a combination of a good approach with the right tools.
You can simultaneously use a data mesh and a data fabric, and even a data hub. First, they are concepts, not things. A data hub as an architectural concept is different from a data hub as a database. Second, they are components, not alternatives. It is practical for architecture to include both data fabric and data mesh. They are not mutually exclusive.
Finally, they are architectural frameworks, not architectures. You don’t have architecture until the frameworks are adapted and customized to your needs, your data, your processes, and your terminology.
Both data meshes and data fabrics have a seat at the data table. In the search for architectural concepts and architectures to support data projects, it all comes down to finding what works best for your own specific needs. Crystalloids is ready to guide you.
No Comments Yet
Let us know what you think