The results of the recent poll conducted on LinkedIn motivated me to write this article and share my practical experience and observations.
The poll’s question was, “What has been your company’s first data management initiative?” 130 data management (DM) professionals from different countries shared their experiences. Figure 1 demonstrates the results.
You can see that almost 50% of companies identified data governance as the first capability to implement. Data quality and data analytics follow. Metadata management closes the list. Unfortunately, LinkedIn has a limited poll capability, so I could not include other capabilities like data and information architecture, security, etc.
In any case, these results help demonstrate the one principle not all companies realize.
Regardless of the chosen title or priority, all companies should implement the same core DM capabilities as they have strong dependencies with each other.
In this article, I will:
- Explain this principle using the capabilities taken into the poll as examples
- Demonstrate the key relationships between core data management (DM) capabilities
Metamodel of data management
Leading industry guidelines have different viewpoints on data management and relationships between its components.
I think the DAMA-DMBOK Wheel was the model that created the wrong impression that data management capabilities can be implemented separately. This model splits data management (DM) into separate pieces of a DM “cake.” However, I must admit that DAMA-DMBOK2 authors understood this weakness of their model. On page 39 of the DAMA-DMBOK2 book, they stated: “None of the pieces of the existing DAMA Management framework describe the relationship between the different Knowledge Areas.”
The reality is that many companies choose one DM capability and start its implementation without realizing its dependencies with other models.
I use the “Orange” data management framework (DMF) in my practice. I present data management as a set of capabilities that play different roles in delivering the data management business value: ensuring a data (assets) lifecycle.
Figure 2 demonstrates this model. For this article, I linked the “Orange” model with the DAMA-DMBOK2 model to make it easier to understand.
The core capability of data management is data lifecycle management.
This capability transforms raw data into meaningful information and, by that, delivers value to data management stakeholders.
Directional capabilities define a direction and create a framework for data management. Data governance and business architecture are examples.
Supporting capabilities enable core and strategic capabilities.
When you map the DAMA-DMBOK2 to this model, you get some questions regarding the Knowledge Areas of the DAMA-DMBOK2 model.
For example, data architecture and modeling according to the TOGAF® Standard, a framework for Enterprise Architecture, belong to data architecture. IS (information systems) architecture consists of data and application architecture. For digital data, it is hardly possible to split data and applications. However, DAMA-DMBOK2 does not take an application architecture in scope. Instead, it comes up with Data Integration and Interoperability and DWH&BI, which can be viewed as a part of the IS architecture. BI can also be mapped to data analytics capability. I did not map a Reference & Master Data Knowledge Area; for me, processing of any data type must have the same set of capabilities.
So, even this high-level mapping confirms the key principle: data management capabilities have dependencies that cannot be avoided. Let’s take four capabilities from the poll and consider their dependencies with other DM capabilities.
Data governance
Data governance is one of the most unaligned concepts in the data management community. If you carefully read the definition of data governance provided by DAMA-DMBOK2, you can easily conclude that “data governance” governs DATA MANAGEMENT, NOT DATA. The definition I refer to sounds like this: “Data governance is the exercise of authority, control, and shared making (planning, monitoring, and enforcement) over the management of data. “ Do you share my viewpoint?
According to the DAMA-DMBOK2 view on “data governance,” this capability should develop and control the implementation of the DM operating model, organizational structure, processes, roles, and policies for other data management capabilities. In other words, data governance establishes a data management function by implementing a data management framework.
So, when a company starts its data management with a data governance initiative, it means the following. The company must already have some data management capabilities in place. Data governance will help transform these capabilities into business functions. Data governance can also initiate the implementation of other capabilities required to meet business goals.
Information systems architecture, security, IT infrastructure, and metadata management are capabilities that must exist, having a formal or informal status, to ensure data lifecycle management.
Data quality
Some companies start their data management journey with a data quality capability. Inefficient business decision-making due to poor data quality is a strong “stick” for this initiative. However, many companies fail in this initiative. One of the biggest reasons is that the successful implementation of a data quality capability requires several other capabilities.
Two core data quality activities are investigating and resolving data quality issues and building data quality checks to prevent these issues. Impact and root-cause analysis are two methods that enable these tasks’ performance. You need to perform these analyses by investigating data movements and transformations at the physical level. The set of capabilities like information systems architecture, data modeling, and metadata management enable data lineage. In turn, data lineage at the physical level is necessary to enable data quality activities.
So, again, we come to the same conclusion: a data quality initiative can hardly be possible without other data management capabilities.
Data analytics
Using ML and AI turns out to be a mantra for many companies. However, without having a solid data management foundation, attempts to get value from data by using ML and AI and establishing self-service analytics can quickly fail. Describing the data you use in the models requires data modeling. The “garbage in, garbage out” principle demonstrates the uselessness of an analytics initiative without having the data quality capability in place. But we´ve just discussed that the data quality needs multiple other capabilities.
So, to succeed with the data analytics initiative, a company must create a solid “data management foundation.”
Metadata management
The poll’s result for this capability is the most challenging among all others. I was surprised that only 11% of respondents indicated it as the first priority initiative.
I assume the key reason for that is the misunderstanding of the metadata and metadata management concepts. Every company has established data pipelines. Along these pipelines, data is being transformed and integrated. It can’t be done without managing metadata. I think many companies have this capability in the ad-hoc format. The poll question meant establishing metadata management as a business function. Maybe it was the reason for low response.
Another reason for underestimating the role of metadata is narrowing the scope of metadata to technical and, maybe, operational ones. Many professionals don´t realize that data models, data lineage, business terms, and definitions are metadata. The majority of data management artifacts are metadata. A company must have a clear strategy to store and map the data management deliverables.
When I developed the metamodel of data lineage, I realized that the data lineage metamodel, in a broad sense, represents the knowledge graph of data management outcomes. In this respect, metadata management unites all other data management capabilities.
The analysis summary
I think the demonstration of the relations of four core data management capabilities with each other and other capabilities convinced you of the veracity of the statement made earlier in this article. All data management capabilities are linked with each other. Implementation of one capability requires the implementation of others.
The key challenge is to find out the links between these capabilities and implement them in the appropriate order. The key reason for that is that outcomes of one capabilities serve as input for others.
An integrated approach to implementing core data management capabilities
This integrated approach is a distinguishing feature of the “Orange” DMF. In this short article, I can share only a high-level example. Figure 3 demonstrates one of the methods that constitute this framework.
This figure represents the high-level top-down approach to deliver artifacts related to 5 core data management capabilities: data modeling, information systems architecture, information systems architecture, etc.
Usually, I recommend starting by analyzing information requirements expressed in various reports and dashboards. Report analysis and governance are complex, considering the hundreds and thousands of reports produced within a company. Developing business models leads to developing data models. Documenting business processes and application flows is the starting point in data lineage development.
Those interested in learning more about this approach can consult my books, “Data Management Toolkit,” “The ‘Orange’ Data Management Framework,” and the online course “Implementing a Data Management Framework.”