Are you interested to know more about setting up a data management function?
The name of this article is the summary of my 10-year hands-on experience in data management. Implementation of data management frameworks and data lineage are the core of my professional experience. These topics might seem quite different, but my practical experience made me conclude that they are very much related to each other. In this article, I explain how and share my knowledge and techniques. We take a deep look at setting up a data management function.
WHAT are data management and data lineage?
This question might look odd as we, data management professionals, should know the answer. One of the key data management tasks is ensuring aligned and unambiguous definitions. Yet, it is not the case for most data management terms.
Data management
Each of my workshops on data management-related topics starts with me asking the participants to formulate in one/two sentences their understanding of data management. So far, I have never heard the same answer from two people. When you try to analyze the definitions of data management from published sources, you face the same challenge. This shows that data management has a variety of meanings that strongly depend on the context.
I consider data management as a “business capability that safeguards the company’s data and information resources and optimizes data and information value chains (‘the chain’) to ensure effective conduction of business.” “business capability” means the ability of data management to reach goals and deliver outcomes. The data and information value chain supports the business in creating business value, as shown in Figure 1. The key data management sub-capabilities such as data governance, data quality, data modeling, and information systems (data and application) architecture are needed to design the chain.
Figure 1. Core data management sub-capabilities
An IT-related set of data management sub-capabilities enables the functioning of the chain.
Data lineage
In my opinion, data lineage is one of the most abstract and misaligned concepts in data management. As one of my colleagues once said: ‘Everybody needs data lineage, nut no one can explain their understanding of it.’ Usually, metadata data lineage documents the flows of data across the organization. The concept of data lineage intercepts with five other concepts: data chain, data value chain, integration architecture, data flow, and information value chain. Some of these terms are even considered to be synonyms of ‘data lineage.’ There are different viewpoints on the constituent components of data lineage. Based on the analysis of the concept of data lineage and the requirements of several legislative documents, I came up with the following set of components that are required for proper documentation of data lineage:
- application landscape
- three levels of data models (conceptual, logical, and physical) with corresponding business rules/ETLs are linked vertically.
- business processes and roles
- reports catalog
- data quality checks and controls catalogs.
(For a more detailed explanation, please check out this article.)
The scheme of the metadata lineage components can be seen in Figure 2. More about data lineage can be found in the set of articles Data Lineage 101, 102, 103, 104, and 105.
Figure 2. The scheme of the metadata lineage components
Now let us discuss how to implement the data management function and demonstrate its relation to data lineage.
HOW are the data management function setup and the documentation of data lineage linked?
The answer to this question lies in similarities between:
- the deliverables of data management sub-capabilities and key components of data lineage
- the logical steps of implementation of data management and documentation of data lineage.
I will demonstrate the process of data management implementation and data lineage documentation using ‘the data management star’ model by Data Crossroads, shown in Figure 3.
Figure 3. The data management star by Data Crossroads.
Step 1. Defining needs and requirements.
Data management has different business stakeholders with specific needs concerning data and information. In Step 1, a company will focus on specifying a feasible scope of data management initiative. The deliverables list includes business drivers, stakeholders, and their most urgent information needs. Information is delivered in the form of reports and dashboards. The report catalog is one of the data lineage components. The scope of the data management initiative will limit the scope of data lineage documentation.
The corresponding data management tasks and responsibilities should be defined when the scope is clear.
Step 2. Dividing tasks and responsibilities
The set of tasks and responsibilities belong to the data management framework, which is a set of rules and roles. Rules include but are not limited to data management strategy, policies, standards, processes, procedures, and plans. Roles should be linked to data management processes, tasks, and deliverables.
Data lineage is one of the deliverables of data management. Therefore, a company needs to specify and document its understanding of data lineage, constituting components, and how to document it (descriptive or automated). Accountabilities regarding data lineage documentation should be assigned to the relevant data management-related roles.
Step 3. Building the data management framework.
The implementation of data management will be done in several steps.
Step 3.1. Specify data requirements
To meet the information requirements specified in Step 1, corresponding data should be found, delivered, and processed. Very often, the relationship between raw data and information is not (entirely) known. Data lineage is a means to fill in this gap. Usually, data lineage documentation starts with the specification of existing business processes and mapping them to the data sets.
Step 3.2. Document business processes
Business process documentation is not considered to be a part of any of the data management sub-capabilities. Still, this is a required component of data lineage. Most companies begin their data lineage documentation with the analysis of business processes. The performance of business processes is closely related to the systems and applications used in these processes.
Step 3.3. Document system and application landscape
Data transformation usually takes place in systems and applications. Documentation of applications and data flows a part of information systems architecture. At the same time, these flows are mandatory components of data lineage.
Step 3.4. Develop conceptual, logical, and physical data models and link them.
Data flows can be documented on different levels of data models: conceptual, logical, and physical. A company might prove data flow/lineage on any of these levels. A company can document data lineage on the combination of these levels. These models and the links between them are data modeling and architecture deliverables. These models and links are, at the same time, mandatory components of data lineage. The links between different data model levels are called ‘vertical data lineage’ or ‘linkage.’
Step 3.5. Identify critical data elements
The definition of critical data elements is a state-of-the-art task. You can read more about practical techniques for the specification of critical data elements in this article. The key reason to apply the concept of critical data is the prioritization of data management initiatives, including building data quality checks and controls. The set of critical data elements is the deliverable of the data modeling sub-capability. The mandatory pre-requisite to specify critical data elements is knowing data lineage.
Specification of data quality requirements and corresponding checks and controls belongs to the deliverables of data quality sub-capability. Data checks and controls are considered being a component of data lineage.
Step 3.6. Assemble data lineage
Data lineage can be assembled only when all the steps mentioned above are done. And at this point, a company de facto is ready to implement a data management function.
Step 4. Intermediate assessment and gap analysis
This step is required to compare the desired results specified in Step 1 with the achieved ones. This is also a point where the maturity assessment of data management can be performed.
Step 5. Planning further actions
As soon as the company has achieved the desired results, it might want to extend the scope of its data management initiative, including the scope of data lineage to be documented.
I hope you understand how the initial statement ‘The setup of data management function follows the logic of the documentation of data lineage’ came to be by now.
If you want to learn more, please read more about our method here or our book, The Data Management Toolkit.
For more insights, visit the Data Crossroads Academy site: