
As we’ve gone from Data Mesh theory to practice, organizations have been shifting their focus towards the central tenet of Data Mesh — building and managing valuable data products. It’s become a strategic factor in how data products enable organizations to make informed data-driven decisions to reduce costs, innovate, or cultivate a business opportunity.
After two years of working with organizations that have adopted Data Mesh and data products, I’ve distilled my thoughts and observations from the most recent Big Data London event and now, in a two-part blog series. Part one focuses on why data products are necessary, the three kinds of data products, and how to govern and manage them. Part two will focus on who will build data products, as well as the timeline of creating data products.
Overall, I outline the learnings around how organizations should consider skills and technology to iterate, fail fast, and create adaptable data products.
Why do we need data products? To close the gap between operational and analytical planes
In the original Data Mesh post by Zhamak Dehghani and in her book (you can get your free copy, here), a key aspect of Data Mesh is to close the gap between the operational data and analytical data planes. The operational data plane is the combination of the technology and people supporting the operational data platforms. Meanwhile, the analytical plane is the combination of technology and people supporting the analytical data platforms.
In Zhamak’s book, she goes so far as to state that organizations transitioning from a data warehouse approach to a Data Mesh approach will involve removing the data warehouse layer and having domains responsible for data from the operational plane and the analytical plane.
Another way to think about the two planes, and much to the chagrin of everyone I tell, I have long thought of the operational and analytical plane as a slice of Victoria Sponge cake:
The slice of cake is comprised of two layers, the analytical plane at the top and the operational plane at the bottom. A lovely layer of sticky strawberry jam in between the two planes, represents the data pipeline responsible for getting data from the operational plane to the analytic plane.
The data warehouse, data lake, or data lakehouse is situated in the analytical plane, and so if we intend to build data products that are based on only this layer, we are consuming only the top half of the cake. This inevitably results in us getting sticky, jammy fingers. What this means in the data world is that we cannot achieve the promised agility of decentralized data ownership.
The reason is: to be truly agile, domains need to be responsible for ingesting data from the operational system, transforming the data, and then serving it. When we introduce a data warehouse, we rely on a centralized data team to perform the ingestion and at least some transformation, which is a Data Mesh anti-pattern. This inevitably results in slow data product development and management.
What we have learned from successful Data Mesh adoption is that the domains need to build and manage data products whose data spans the operational and analytical data plane. They need to consume an entire slice of cake, from top to bottom.
To motivate domains to build data products and achieve agility, various approaches have been observed in terms of skills, responsibilities, and incentivization. In all scenarios, we need to ensure that each domain has the technology and data skills required to build data products.
This can significantly increase spend at the enterprise level, and there will likely be costly data engineering skills duplicated across the domains. An alternative approach to this is to provide simplified access that abstracts much of the need for technology knowledge and skills to access the data in the operational and analytical planes.
This approach significantly reduces the need for technology skills and thus the expense of specialized resources within each domain, ensuring that data remains a first-class concern. This is a reason why organizations have adopted Starburst as a key component in their Data Mesh implementation.