What Are The Different Types Of Data Products

Why data products for data governance

December 16, 2022

Andy Mott, MBA

EMEA Head of Partner Solutions Architecture and Data Mesh Lead

Starburst

Andy Mott, MBA

EMEA Head of Partner Solutions Architecture and Data Mesh Lead

Starburst

More deployment options

Request Enterprise trial license key →

Start for Free with Starburst Galaxy

Try our free trial today and see how you can improve your data performance.

Start Free

Operationalizing data products at scale with AI

As we’ve gone from Data Mesh theory to practice, organizations have been shifting their focus towards the central tenet of Data Mesh — building and managing valuable data products. It’s become a strategic factor in how data products enable organizations to make informed data-driven decisions to reduce costs, innovate, or cultivate a business opportunity.

After two years of working with organizations that have adopted Data Mesh and data products, I’ve distilled my thoughts and observations from the most recent Big Data London event and now, in a two-part blog series. Part one focuses on why data products are necessary, the three kinds of data products, and how to govern and manage them. Part two will focus on who will build data products, as well as the timeline of creating data products.

Overall, I outline the learnings around how organizations should consider skills and technology to iterate, fail fast, and create adaptable data products.

Why do we need data products? To close the gap between operational and analytical planes

In the original Data Mesh post by Zhamak Dehghani and in her book (you can get your free copy, here), a key aspect of Data Mesh is to close the gap between the operational data and analytical data planes. The operational data plane is the combination of the technology and people supporting the operational data platforms. Meanwhile, the analytical plane is the combination of technology and people supporting the analytical data platforms.

In Zhamak’s book, she goes so far as to state that organizations transitioning from a data warehouse approach to a Data Mesh approach will involve removing the data warehouse layer and having domains responsible for data from the operational plane and the analytical plane.

Another way to think about the two planes, and much to the chagrin of everyone I tell, I have long thought of the operational and analytical plane as a slice of Victoria Sponge cake:

Image credit

The slice of cake is comprised of two layers, the analytical plane at the top and the operational plane at the bottom. A lovely layer of sticky strawberry jam in between the two planes, represents the data pipeline responsible for getting data from the operational plane to the analytic plane.

The data warehouse, data lake, or data lakehouse is situated in the analytical plane, and so if we intend to build data products that are based on only this layer, we are consuming only the top half of the cake. This inevitably results in us getting sticky, jammy fingers. What this means in the data world is that we cannot achieve the promised agility of decentralized data ownership.

The reason is: to be truly agile, domains need to be responsible for ingesting data from the operational system, transforming the data, and then serving it. When we introduce a data warehouse, we rely on a centralized data team to perform the ingestion and at least some transformation, which is a Data Mesh anti-pattern. This inevitably results in slow data product development and management.

What we have learned from successful Data Mesh adoption is that the domains need to build and manage data products whose data spans the operational and analytical data plane. They need to consume an entire slice of cake, from top to bottom.

To motivate domains to build data products and achieve agility, various approaches have been observed in terms of skills, responsibilities, and incentivization. In all scenarios, we need to ensure that each domain has the technology and data skills required to build data products.

This can significantly increase spend at the enterprise level, and there will likely be costly data engineering skills duplicated across the domains. An alternative approach to this is to provide simplified access that abstracts much of the need for technology knowledge and skills to access the data in the operational and analytical planes.

This approach significantly reduces the need for technology skills and thus the expense of specialized resources within each domain, ensuring that data remains a first-class concern. This is a reason why organizations have adopted Starburst as a key component in their Data Mesh implementation.

Data Products For Dummies

Maximize data’s value, usability, and shareability

Read free

Data pipelines in a Data Mesh

Over the last year, I’ve heard data pros say that Data Mesh removes the need for data pipelines; however, that is not what I have observed. Pipelines are alive and well. However, when we think about pipelines in a Data Mesh, they are essentially a ‘chain’ of data products.

For example, in the image above, we have a data product that sources data from the CRM system. Its output data is then consumed by another data product that transforms it in a particular way. We also have another data product that integrates with this data, utilizing the ERP system.

The reason this is interesting and different from what has been done before is that we now have clear ownership of each of those data products throughout the pipeline. If there is an issue with the data pipeline, we know immediately who is responsible.

Furthermore, data product owners know whose data they are consuming and who is consuming their data product. This means that the data product owners can notify and collaborate with their upstream data providers and downstream data consumers around changes that they need to make. This area of collaboration and notification is currently experiencing significant debate in the Data Mesh community, especially around the concept of data contracts.

From my observation, these changes are now being integrated into version control systems, so that individual data product owners can make versioned changes as they need without being constrained by consumers of their data products.

Three types of data products

Next, when we’re thinking about the data product types that are identified in Zhamak’s book, there are three that are clearly defined.

#1 Source-aligned data products

The first one is the source-aligned data product. This represents the data as it is in the operational system with minimal transformation. I am seeing organizations use these as a first step to creating more valuable data products.

The interesting observation I would make here is that data fabric technologies are beginning to be used to create these first-level data products semi-autonomously. I think this puts to bed the debate that has come up again and again, around which is the correct route forward for an organization’s data mesh or data fabric; I would suggest that the answer might be both.

This might be a topic that I revisit in a later blog; however, in the diagram below, we can see the use of a data fabric to automate the creation of the source-aligned data products, which can act as a source for consumer-aligned data products.

#2 Consumer-aligned data products

The next data product type is the consumer-aligned data product. When ‘data products’ are referred to generically, these are the data products that people think about and discuss most.

These data products are produced by business experts within the domain that generate value through the codification of business knowledge and expertise. To create these data products, we need as little ‘technology friction’ as possible. Domain experts should be able to create these data products with as little additional help and expertise from within or without the domain as possible.

#3 Aggregate data products

Lastly, the TL;DR definition of aggregate data products is that they’re built at a corporate level to drive global KPIs.

There have been many discussions on what these are and how they differ from consumer-aligned data products — this is perhaps a discussion for another day. However, we have observed that organizations define aggregate data products in their own unique manner. Below is an image that addresses how data products align to the enterprise-level KPIs and the corporate objectives of the business.

Further, we can see a top-down approach, where we have defined corporate KPIs that are comprised of cross-business-unit KPIs. Source-aligned or consumer-aligned domain-created data products enable the lower-level KPIs. In this picture, the aggregate data products are those data products that bring together the data from the cross-business unit KPI data products to support the corporate-level KPIs.

Leverage usage metrics to build valuable data products

Data governance is often top of mind when it comes to new initiatives such as building data products. Historically, when we think about governance, we consider access controls, security, ownership, lineage, and usage metrics. Usage metrics serve as a means to document, report, and categorize how data consumers utilize data in their analytics.

I propose that we use usage metrics to drive behaviors essential to the organization.

From the perspective of the data product developer, usage metrics are critically important because they are a simple (perhaps too simple) way to measure the value of a data product. Meanwhile, the higher the usage, the higher the value of that data product to the organization. This means that data product developers know which data products to focus on and which to retire. From a senior management perspective, we can implement usage metrics as a vehicle for employee incentivization and motivation.

From an end-user perspective, the usage metrics of data products provide us with insight into the trustworthiness of a data product. The higher the usage, the higher the trust that we can have in a data product.

Initially, we need to perform business analysis to decipher which data products we think will be valuable. Then, based on the data usage reports, data producers can take proactive action to make the data products easier to use, easier to find, and more useful.

In short, with data products, we want to know exactly who’s using them and how they’re using them, so that we can measure their value. Then, as a result, we can move from reactive data management to proactive data management.

From reactive to proactive data management

Historically, data ownership has been an afterthought, and since data was not treated as a product, the adjustment of data being consumed for strategic reasons was incredibly reactive. However, with data treated as a product in a Data Mesh, the lifecycle of data becomes proactive and akin to any other product. This is the main difference between a ‘data asset’ and a ‘data product’, and it provides a very simple way to define any data product.

9 data products success stories with Starburst

Learn more

Schedule a call with an expert

Book time