Data lineage is on the radars of business and data management professionals now more than ever. Many companies understand how necessary it is. But this understanding does not often lead to immediate action. There are several possible explanations for this:
- Data lineage implementation is complex- it is also time and resource-consuming.
- The tangible benefit to a company and its various stakeholders might be unclear.
- Other business initiatives take higher priority.
In this article, we will:
- Discuss the need for and benefit of data lineage to various data lineage stakeholder groups
- Demonstrate that data lineage is a necessary foundation for many other business and data management initiatives
Data lineage in a business context
Data lineage is a complex concept. Five years ago, at the beginning of my professional data lineage journey, a colleague of mine said: “Everybody needs data lineage. Nobody can explain what they mean by that.” Since then, not much has changed- my colleague’s statement is still incredibly accurate.
Let’s first align our understanding of data lineage.
Data lineage is the description of data movements and transformations at various abstraction levels along data chains and of the relationships between data at these levels. Figure 1 illustrates this definition of data lineage.
Digital data moves between various IT systems and applications. We can describe the movement of data sets between IT applications and link them to corresponding business processes. This is the highest abstraction level of documentation, called “business.”
Every IT system or application has a database. The description of data movement at the level of database tables and columns characterizes the physical level of data lineage.
We can also describe data movements using various data models: conceptual/semantic and/or logical/solution. The levels of data lineage documentation will have similar names.
When we describe data lineage at one of these levels, we are talking about horizontal data lineage. When data lineage is linked at two or more abstraction levels, it is defined as vertical data lineage.
It is worth remembering that we are not describing the data itself. We are describing the way data is being processed by means of metadata. Therefore, speaking about data lineage, literally, we mean “metadata lineage.” Different metadata repositories maintain metadata at different abstraction levels. Data lineage maps the metadata that is kept in different metadata repositories.
Metadata management and data lineage have an interesting relationship: data lineage by itself is a metadata construct that enables the integration of various metadata types.
This complex data lineage model leads to the following conclusion:
Data lineage can be documented differently depending on a company’s needs. It is important to realize that a company as a whole cannot accurately define the needs for and benefits of data lineage because these needs and benefits vary by business driver. Each business driver has different data lineage stakeholders within a company with different business needs and expectations and requires various benefits from data lineage.
Data lineage for various stakeholder groups
Let’s investigate the need for and benefit of data lineage to various data lineage stakeholders, such as:
- Management
- Business and support functions (Finance and Risk)
- Data management professionals
- Technical (IT) professionals
Management
Any data management initiative, including a data lineage one, is time and resource-consuming. A company’s management must see significant business benefits and anticipate a worthwhile ROI to invest in such an initiative. From my experience, if we use the “carrot and stick” metaphor, the “stick” business driver is usually more effective in starting a data lineage initiative. Figure 2 demonstrates the need for and benefits of data lineage for a company’s management.
Compliance with various regulations is one of the key drivers. Financial institutions experience the most significant pressure in terms of regulations regarding data lineage. BCBS239 (Principles for effective risk data aggregation and risk reporting by the Basel Committee on Banking Supervision), GDPR (General Data Protection Regulation), and IFRS 17 are examples of such regulations. The need to comply with these regulations pushes companies to start compliance-related initiatives. Compliance with regulations leads to reducing operational risk and avoiding potential fines.
Many companies want to decrease high IT costs to improve their overall financial results. To do so, a company should replace legacy software, move to the cloud, and optimize data chains to avoid data redundancy. The corresponding initiatives will require significant initial investments. A company should expect an ROI on such initiatives in a medium-term timeframe due to ultimately decreased IT costs.
Many companies also need to improve decision-making in a currently turbulent business environment. Artificial intelligence and machine learning methodologies assist in achieving this goal. The application of these methodologies leads to improvements in data processing and analysis.
Business optimization, i.e., digital transformation, is on the agenda of many companies. These initiatives require significant initial investments to be paid off in the medium- or long term. These initiatives should lead to optimizing data chains and reducing IT costs.
A company should have documented metadata lineage to meet these needs and perform all related initiatives. Documentation of data lineage will require establishing a solid data management function.
Business and support functions (finance and risk)
If something goes wrong with data, finance and risk functions experience the most pain. Therefore, you can’t underestimate the needs and influence of this stakeholder group regarding a data lineage initiative.
Figure 3 illustrates the need for and benefit of data lineage for this stakeholder group.
Compliance with various regulations is the stick that motivates this stakeholder group to initiate and participate in a data lineage initiative. Compliance with regulations ensures reducing operational risk and avoiding potential fines.
Data for finance and risk functions is a means to perform their duties and to create key daily deliverables. Many financial and risk professionals spend most of their time manually manipulating data: correcting errors and cleansing data, transforming data, preparing reports in Excel, investigating data origins, and so on.
Internal and external audit requirements regarding the explanation of data require a lot of knowledge about data origin and transformation. It is common practice for financial professionals to investigate this information themselves and store results on their local computers as MS Word documents.
This stakeholder group is interested in sponsoring and participating in digital transformation and data quality initiatives. Successful performance of data lineage and data management initiatives will bring to this stakeholder group increased productivity.
Business and support functions need not only outcomes of metadata lineage but also of data value lineage. Data value lineage identifies changes in data values along data chains. Unfortunately, technically documenting data value lineage is difficult to do. However- metadata lineage assists in building reconciliation reports for data points along data chains. In any case, data lineage streamlines the reconciliation efforts of multiple finance and risk professionals.
Data management professionals
Data lineage is a complex concept, even for data management professionals. The complexity derives from the fact that no agreed definition regarding data lineage exists in the data management community. Furthermore, the definition and components of data lineage and several other data management concepts intersect. Data lineage is complex in its implementation, but its utilization is even more complicated. These challenges are much of the reason that while many data management professionals realize the necessity of data lineage, only a few have real practical experience.
Figure 4 demonstrates the need for and benefit of data lineage for data management professionals.
Data management (DM) professionals are also interested in complying with regulations. DM professionals should ensure that a company complies with regulations. Therefore, the participation of DM professionals in any data-related compliance initiative is mandatory. Data lineage enables the performance of multiple data management initiatives like data quality, metadata management, and master and reference data. Data lineage assists in the performance of impact and root-cause analysis. It is impossible to analyze data quality issues and build data quality checks without knowledge of data lineage at the physical level. Data lineage helps perform change initiatives like the substitution of IT applications. Data lineage also assists in mapping information and data requirements and identifying new data sources. It is important to understand that data lineage is a “must” condition for multiple data management-related initiatives.
Data management professionals need metadata lineage at every abstraction level. However, data lineage at the physical level is often the most important.
Technical (IT) professionals
The needs and benefits of technical (IT) professionals are similar to those of data management professionals. They focus on the utilization of data lineage in different data-related initiatives.
These initiatives can be classified into two types for technical professionals: DevOps and migrations. Figure 5 illustrates the data lineage needs and benefits for technical (IT) professionals.
The primary need of technical professionals is the ability to perform impact and root-cause analysis at the physical level. However, this stakeholder group also needs to know technical and operational metadata. Modern data lineage solutions are capable of capturing different types of metadata. Availability and utilization of metadata lineage assist in increasing productivity and reducing the time required for issue resolution, both of which lead to a reduction of IT-related costs. Technical professionals mainly need metadata lineage at the physical level. However, technical professionals will also use metadata lineage to realize value data lineage.
This concludes our review of the data lineage needs and benefits of various data lineage stakeholders. Please share your experience in dealing with different data lineage stakeholders with us.
For more insights, visit the Data Crossroads Academy site:
//academy.datacrossroads.nl/courses/data-lineage-what-why-how/lesson/data-lineage-what-why-how/