In our previous ‘Data Lineage 101’ articles, we have discussed:

‘Data Lineage 101’: Why do we need data lineage?

‘Data Lineage 102’: What is data lineage?

‘Data Lineage 103’: What are the key legislation requirements for data lineage?

‘Data Lineage 104’: How can you document data lineage?

If by now you have decided that you and your company are ready to continue this journey and start the implementation of data lineage, let’s draft the key steps you need to take.

Before we start, it is worth mentioning that an established data management framework and collaboration between data management professionals and stakeholders are a prerequisite for successful implementation of data lineage.

The implementation itself can be divided in 7 steps.

  1. Identify your key business drivers for data lineage.

Your company should have serious reasons to start thinking about documenting of data lineage. For example:

  • legislation requirements
  • business changes
  • data quality initiatives
  • supervisory and audit requirements.

If any of these becomes crucial for the business, then you are ready to start discussing data lineage documentation with the top management of your company.

  1. Buy-in support and involvement of top management.

Neither data management nor data lineage should be implemented just for fun. Both require a lot of resources, human as well as financial, and will consume a lot of time. Without the dedication of top management, such initiatives have no future. There are two key groups of benefits that might convince your management to support such an initiative. These are:

  • Improved work efficiency and increased revenue in the medium-term. This can be achieved rather fast by, for example, only improving the quality of your data. In more concrete terms improving data quality can lead to:
    • increased revenue by 15-20 %1
    • reduced operational costs by 40%2
    • decreased IT maintenance cost by 40-50%3.

These monetary benefits will be a result of reducing the cost of a lot of manual operations with data, optimized application landscape etc.

  • Compliancy with regulations, e.g. GDPR (the EU General Data Protection Regulation). If you live in the EU, you are probably familiar with the fines your company could receive due to data breaches.

When your top management gives you the ‘green light’ to the data lineage initiative, it is time to think about the scope of your initiative.

  1. Scope your data lineage initiative.

For each business driver you have chosen, you can find corresponding data sets. This is the first filter to use that will help you narrow down your scope. For example, GDPR focuses on personal data. If you just start your data quality initiative, the chance is high that the first thing you look at, will be customer data.

The second filter is identification of critical data elements (CDE) within these data sets. CDEs are data elements that make the biggest impact on the performance of your company and customer experience. Usually, these are the key KPIs used to manage the company.

The techniques to identify these CDEs (KPIs) are rather simple. First, you need to choose the most critical management reports and the KPIs which are located there. The difficulties start when you need to identify which source data elements are needed to calculate these CDEs. And this is where the story with data lineage documentation begins. Once you have agreed on the scope of your initiative, you can define the scope of data lineage.

  1. Define the scope for data lineage

You scope data lineage by using the concepts of ‘horizontal and vertical data lineage’.

The whole scope of data lineage starts with the original data sources and ends at the point of final usage. In large companies, especially with a lot of subsidiaries, such chains are rather long and complicated. That is why very often a company starts with a limited ‘length’ of data lineage, for example, at some point of data aggregation.

You can document data lineage on different levels of data models: conceptual, logical and physical. The choice of the number of levels on which you will document data lineage narrows the scope of data lineage as well.

  1. Prepare the business requirements for data lineage

Different groups of stakeholders have different requirements and expectation for data lineage.

There are at least two key groups: business stakeholders, i.e. audit, business and data analysis, financial controllers, and technical stakeholders, i.e. IT engineers, database managers etc.

If your company has little experience with data lineage, the topic remains very abstract. As said by one of my colleagues: ‘Everyone wants data lineage, but no one can exactly explain what they mean by that and what their expectations are’. While conducting interviews with business stakeholders, I came to fully agree with this statement.

There are some specific features when it comes to the requirements of these two groups.

Business stakeholders are mostly interested in:

  • the ability to run root-cause analysis, starting from the end reports and going back to the ‘golden’ source
  • the value of data lineage rather than its design

(The differences between these two types of data lineage I explained in ‘Data Lineage 104’.)

  • data lineage on conceptual or logical data model levels
    (The ‘in depth’ explanation of data lineage components at these levels you will also find in ‘Data Lineage 104’.)

On the other hand, the technical stakeholders focus on:

  • impact analysis, starting from the source of data elements and its path to its final destination
  • metadata design lineage
  • data lineage on physical level.

I would advise you to spend some time and talk to different groups of business stakeholders to clarify their expectations, make them more realistic and align all the requirements in a unified document.

When this is done, you can finally move to deciding how you will document data lineage.

  1. Choose the method to document data lineage

The comparative analysis of two methods: descriptive and automated I already provided in ‘Data Lineage 104’.

As I have already stressed several times, documentation of data lineage is a very time and resource consuming task.

First of all, you should assess which of the existing methods is most feasible for your company resources.

The level of documentation of data lineage will also impact your decision regarding the method. You also should be aware that regardless of the method you choose, a lot of manual work will still be required to document data lineage.

As soon as decision is made you should think about suitable software.

  1. Choose the suitable application to document data lineage

Not surprisingly, even large companies document data lineage using MS applications such as Excel, Word, PowerPoint, Visio. If you decide to document data lineage on conceptual or logical levels and are presented with a choice of applications, take a look at such applications as Axon, Collibra, or Erwin. Should you opt for an automated solution, market leaders such as SAS and Informatica would get your attention first. The key providers of metadata automated metadata lineage are available at metaintegration.com.

At this point, our adventure into the data lineage world comes to the end. During my personal journey into data lineage I have realized that establishing a data management framework revolves around optimization of data lineage, or in other words, data & information value chain.

If you are interested to know more about documentation of data lineage and information value chain you can consult my latest book ‘The Data Management Toolkit’.

——————————————————————————————————–

References:

  1. BackOffice Associates. “How Data Quality Impacts Business Processes”: boaweb.com/rs/backoffice/images/Infographic-DQ_FINAL.pdf.
  2. BackOffice Associates. “How Data Quality Impacts Business Processes”: boaweb.com/rs/backoffice/images/Infographic-DQ_FINAL.pdf.
  3. BackOffice Associates. “How Data Quality Impacts Business Processes”: boaweb.com/rs/backoffice/images/Infographic-DQ_FINAL.pdf.