In today’s digital landscape, organizations face a wide array of data management challenges due to the increasing volume, variety, and complexity of data—and all the various apps and users who need to access that data. Top among those challenges is the need to select the right data architecture and supporting technologies to meet evolving business needs and data requirements while ensuring data quality, security, and more.
Data mesh and data fabric are two distinct approaches to managing data and making data and insights accessible to business teams and individual users who need to work with that data for the benefit of the organization. Which option you choose in the data mesh vs. data fabric debate depends largely on your data strategy, and whether you’re free to democratize data or keep stricter access controls around it.
So, if you want to decentralize data ownership and management and put data in the hands of specific teams, instituting data mesh is likely the route you want to take. But if you need to keep data management more centralized, data fabric, which provides a unified layer for data access and integration across diverse sources, may be the best approach.
Here’s another way to contrast data mesh vs. data fabric:
- A data mesh architecture is designed to reduce friction to data access and promote collaboration. It provides more of a user-centric approach to data management.
- A data fabric architecture is a more automated approach to bringing data from various sources and systems together to derive insights from that data.
Ultimately, you might decide your organization should use both approaches—and many businesses do. These are complementary approaches to making data more accessible so it can be used to create business value, and they can be used together.
To help you decide what works best for your business, here’s a closer look at the differences between a data fabric and a data mesh.
What Is Data Mesh?
A data mesh is a data management architecture that decentralizes data analytics, making it readily available to multiple departments. It accomplishes this through 4 key principles:
- Domain-driven decentralization: ensures data is owned by the end users who are most familiar with it (i.e., domain experts).
- Data as a product: involves treating data as a product and other teams and departments in your organization as customers. A data product is a microservice that encapsulates all elements needed to perform its data outcome (data, code, and infrastructure.).
- Self-service data platform: A centralized automated platform that allows the decentralized data domains to communicate with one another.
- Federated governance: Data governance standards that allow the decentralized data products and domains across a data mesh to work together.
It focuses on treating data as a first-class product, ensuring that data is well stewarded, protected, and valued. It categorizes data based on the relevant business sector and provides access to the business user closest to the data.
Is Data Mesh a Technology or a Methodology?
Data mesh is a methodology in that it offers a way for individual teams or specific business areas, like marketing, sales, or customer care functions, to own and manage their data. The mesh is a network of interconnected domains, and the infrastructure supporting the data mesh includes data lakes and data warehouses, which individual teams or functions are responsible for managing.
These teams and functions can create a self-service data platform in the data mesh to help them build, deploy, and maintain data products simply and securely. (A “data product” in a data mesh is essentially the outcome of using data—it can be anything from a consumer’s credit score to a business unit’s sales forecast.)
Users can locate and understand data across a data mesh using metadata and discovery tools. Data can also be exchanged between teams and domains using application programming interfaces (APIs) and data pipelines (i.e., digital processes for collecting, modifying, and delivering data).
What Is a Business Domain?
A business domain refers to a specific area of expertise, responsibility, or focus within an organization. It could be an entire business unit or a specific department, like sales, or a team, such as a team of data scientists working on artificial intelligence (AI) and machine learning (ML) projects for the business.
In an e-commerce company, for example, a business domain might be a group handling all product-related data, including descriptions, prices, and availability, for a product catalog. In a healthcare organization, a business domain could be the billing and insurance function, where all patient billing, insurance claims, and related financial data are managed.
What Is Domain-driven Data?
Domain-driven data refers to the practice of organizing and managing data in alignment with the specific domains or areas of expertise within an organization. Business units or teams own specific data collections and have responsibility for the quality, accessibility, and security of that data.
The concept of domain-driven data is closely related to the principles of domain-driven design (DDD). The DDD approach to software development emphasizes the importance of modeling the problem domain to create software that reflects real-world business processes and rules. The approach encourages collaboration between domain experts, software developers, and stakeholders.
What Technologies Make Data Mesh Possible?
Here’s a brief overview of some of the technologies typically used to build and maintain a data mesh architecture:
- Data mesh and data governance tools: Data mesh tools include data catalogs, data observability tools, and data mesh query engines. Examples of data governance tools include solutions for data classification and tagging and data security and access control, as well as data governance frameworks, like the Data Management Body of Knowledge (DMBoK).
- API management in data mesh: API management plays a crucial role in enabling data consumers to interact with and retrieve data from various data products while adhering to data governance, security, and accessibility standards.
Organizations that adopt a data mesh architecture are likely to use a range of tools to manage the API lifecycle, including creation, deployment, and versioning; institute self-service access; ensure security and authentication; enable monitoring and analytics; create API documentation; and more.
- Microservices architecture for data mesh: This architecture combines the benefits of microservices, like modularity, scalability, and flexibility, with the decentralized and domain-centric data management philosophy of a data mesh. Kubernetes is commonly used to orchestrate microservices within a data mesh.
Some elements you may find in a microservices architecture for data mesh can include domain-oriented microservices that a team might create and maintain to handle data ingestion, transformation, and management. Data processing pipelines that focus on specific processing tasks and API management tools for promoting collaboration are also typical components.
Why Does Data Mesh Need a Cloud-native Infrastructure?
A data mesh can benefit significantly from a cloud-native infrastructure because the tools and platforms in these environments can, among other things:
- Support microservices architecture, as described above
- Provide elastic scalability so organizations can easily scale their data infrastructure up or down based on demand
- Promote resource efficiency and reduce costs by allowing teams to allocate resources precisely where they’re needed
- Offer scalable, cost-effective data lakes and data warehouses for storing and managing the diverse data sets created within a data mesh
- Provide data integration and ETL (Extract, Transform, Load) tools that simplify the process of ingesting, transforming, and moving data within a data mesh
- Feature extensive monitoring, logging, and management tools for tracking the health and performance of data products and services
- Support containerization technologies, which offer portability and consistency in deploying data products and services
Note: Many cloud providers also offer managed Kubernetes services, which make it easier to deploy and manage containerized services and applications.
Examples of Data Mesh in Action
As you consider your options with data mesh vs. data fabric, it can be helpful to visualize how data mesh enables domains to work with data. Here are just three possible use cases:
- A financial institution implements a data mesh so it can handle diverse financial data sources like market data and customer transactions. Each financial product or service (e.g., credit cards, investment portfolios) is managed by a dedicated domain team. They ensure compliance with relevant regulatory requirements while delivering personalized financial services and advice to customers.
- A streaming media provider adopts a data mesh approach to improve content recommendations and personalization for its subscribers. Content categories, such as movies and documentaries, are treated as domains. Specialized teams curate data and content within their domains to provide more accurate content recommendations and engage users.
- An energy utility institutes a data mesh architecture to manage data from its energy generation and distribution operations. Different energy sources, including solar and wind, and grid management components are treated as separate domains in the data mesh. The utility can then use data products generated by those domains to optimize energy production and distribution to drive efficiency and sustainability.
What Is Data Fabric?
Now, let’s look at data fabric. Data fabric is a type of data architecture in which data is provisioned through a unified integrated access layer that is available across an organization’s IT infrastructure. The fabric provides a unified, real-time view of data, enabling the business to integrate data management processes with its data from various sources, including hybrid cloud environments, web applications, and edge devices.
Is Data Fabric a Technology or an Approach?
Data fabric is a data management concept, and it’s often referred to as an approach. A data fabric architecture is meant to help organizations address the challenges of managing increasingly complex data environments such as on-premises data centers, cloud infrastructure, edge computing devices, and various data storage technologies.
A data fabric solution features services and technologies that enable processes such as data integration, governance, cataloging, discovery, orchestration, and more. The architectural elements of data fabric include, but are not limited to:
- A data transport layer for moving data across the fabric
- Advanced algorithms for data analysis
- APIs and software development kits (SDKs) for making data and insights available to front-end users through the tools they use to work with data—such as a business analytics or data visualization program
What Is a Centralized Data Integration Layer?
A centralized data integration layer consolidates data integration processes into one centralized infrastructure. In a data fabric approach, this layer creates a cohesive, integrated view of data across the organization. By consolidating data integration tasks, it makes it easier to connect, ingest, transform, and distribute data from various sources.
Organizations that have a strong need for data governance, compliance, and data consistency across their departments and business units often use a centralized data integration layer in their data architecture.
Examples of Data Fabric Solutions
Some organizations opt to build their own data fabric architecture so they can customize it to meet their specific data access needs and existing technology stack and IT infrastructure. And there are many open source projects and tools—like Apache Kafka, Apache Spark, and Apache Hadoop—that can be combined to create a customized data fabric.
However, if you don’t want to DIY your data fabric architecture, you can also look to the marketplace for solutions. Some data fabric providers include:
Data Mesh vs. Data Fabric
So, to recap, data mesh empowers domain-specific teams to work with and collaborate on data, while data fabric can provide a more comprehensive strategy for data management:
- Data mesh is a decentralized approach: Data mesh is decentralized by design, and its core principles are centered around distributing data ownership, access, and responsibility across various domain-specific teams or units within an organization.
Because data mesh helps to break down data silos and increase access to high-quality data, it can create significant benefits for analytics and AI/ML teams. With direct access to the data they need, they can work more autonomously and efficiently, collaborate more effectively, and iterate and experiment with data faster.
- Data fabric is a unified architecture: Data fabric aims to provide a consistent, unified approach to accessing and interacting with data, regardless of where that data is stored or how it’s formatted. That includes structured and unstructured data and data in relational databases, NoSQL databases, data lakes, and the cloud.
Data fabric isn’t just for collecting and storing data, though. Its architecture includes AI/ML and analytics capabilities for transforming and processing data fast and at scale. A data fabric approach also helps to make data less siloed and available to more users in an organization. And it allows businesses to maintain appropriate data access and governance restrictions, enhancing data security and compliance.
When to Use Data Mesh vs. Data Fabric
Choosing when to use data mesh vs. data fabric depends on your overall data strategy, your data management and access needs, and your existing infrastructure. Other factors, such as your organizational culture, team structures, and the maturity of your data capabilities might also factor into your decision-making.
The following are just a few things you might want to consider when deciding whether to use data mesh or data fabric:
Data mesh may be the best option if you:
- Prefer to decentralize data ownership and management
- Have complex and diverse data ecosystems that include various data sources, formats, and storage locations
- Want to make data more accessible to various teams and functions in the business
- Need to adhere to strict data quality and governance requirements
Data fabric may be the best option if you:
- Want to break down data silos and consolidate data into a single, unified view
- Have significant data integration needs to support data workflows
- Need to manage data across hybrid or multicloud environments
- Plan to modernize and extend the capabilities of a legacy data warehouse
And, as explained earlier, you may find that instead of choosing one side in the data mesh vs. data fabric debate, a hybrid approach that combines elements of both is the best option for your business and its data management needs.