Data Lineage in DAMA-DMBOK2...and how it relates to BCBS 239 and GDPR requirements.
What unites Data lineage, BCBS 239, and GDPR?
Data lineage has become a hot topic in the data management community. BCBS 239 and GDPR regulations have brought high interest to the subject.
Data lineage often comes to light even if not directly requested when discussing BCBS 239 and GDPR requirements.
Is there an agreed vision on the definition of Data lineage in the data management community? In one of my previous articles, you can find an answer.
What does DAMA DM-BOK 1 say about Data lineage?
DAMA 1 did not say much on the subject. It stipulated that Data lineage, Data flows are also names for the concept of Data integration architecture ([1], par.4.4.2.5). At the same time, the DAMA dictionary provided separate definitions for these two terms [3]. The terms have been considered interchangeable.
What has been added to DAMA DM-BOK 2?
Several changes and innovations have been introduced:
- Relation between Data lineage and Data flow;
- The new definition of Data flow;
- Requirements for Data lineage documentation format and tooling;
- Relation between Data lineage and Data lifecycle;
- Data lineage concept in different Data management knowledge areas.
Let’s take a deeper look at each of the subjects.
1. Relationship between Data lineage and Data flow.
You get the impression that concepts of Data lineage and Data flows have been separated while reading: “Data flows are a type of data lineage documentation that depicts how data moves through business processes and systems. ([2], Ch.4, par.1.3.3.2).
But then you read the following: “Data lineage information is required when making changes to data flows.” ([2], Ch.8, par.6.2).
Conclusions:
- There is still no clear distinction between the terms;
- The definitions are still used interchangeably.
2. A new definition of "Data flow."
I encountered remarkable additions to the definition of Data flow :
“Data flows map and document relationships between data and:
- Application within a business process;
- Datastores or databases in an environment;
- Network segments (useful for security mapping);
- Business roles, depicting which roles have responsibility for creating, updating, using, and deleting data;
- Location where local differences occur." ([3],Ch.4,par.1.3.3.2)
As you can see:
- The data flow has been extended with business processes and role components;
- Data flow substitutes the Information value chain concept presented in [1].
3. Requirements for Data lineage documentation format and tooling.
The requirements and statements about Data lineage documentation format and tooling are spread through the document and appeared in several chapters.
The most notable statements are the following:
- Data flow/ lineage can be documented at one of two levels: high- and detailed ( [3], Ch.8, par.2.1.3);
- “Data flows can be documented at a different level of details: subject area, business entity, or even the attribute level”( [3], Ch.4, par.1.3.3.2];
- Various examples of (data) lineage have been presented in different chapters and different formats (at least 5)
- “Microsoft Excel is a frequently-used lineage tool. Lineage is also frequently captured in data modeling tools, Metadata repository, or data integration tools ([3],Ch. 5, 3.2), and graphic design applications ([3] Ch.4, p.3.3).
All said above only confirmed the conclusions usually made by professionals:
- there are no tools available to cover all Data lineage;
- there is no ‘best’ solution to document Data lineage.
4. Data lineage and Data lifecycle concepts have been mapped.
One of the new developments in DAMA 2 is reflected in the statement,”‘Lifecycle and lineage intersect and can be understood in relation to each other. Data not only has a lifecycle, it also has lineage (i.e.) a pathway along which it moves from its point of origin to its point of usage, sometimes called, the data chain)." ([3], Ch.1,par.2.5.9).
It would help if you asked the following questions:
- If there are different documentation formats for Data lineage possible, which one will you use to link to the Data lifecycle?
- What is a format to present the Data lifecycle?
5. Data lineage concept has been introduced in different Data management knowledge areas.
The data lineage concept has been highlighted in different Data management knowledge areas ( at least, 7).
In each area, the Data lineage is put in a different context:
- Input: Data quality;
- Deliverable: Data modeling and design, Metadata;
- Requirement: Data architecture, Data modeling, and Design, Data integration and interoperability, Reference and Master data, DWH & BI;
- Activity: Data integration and interoperability;
- Technique: Metadata.
Conclusion:
- Such differentiation might require different definitions for Data lineage depending on the context.
How does DAMA 2 contribute to meeting BCBS 239 and GDPR requirements?
My conclusion after a deep analysis of BCBS 239 and ECB requirements was: to be compliant with BCBS 239 you have to deliver Information value chain (not simply Data lineage) with the following attributes: Data and Business- and Technical Metadata Flow between sources and end reports mapped to Business processes, Data Quality and Other types of controls.
There is still a discussion on the GDPR requirements, but the majority agrees: you need to know your business processes and roles, systems, data, data lifecycle, and data lineage.
So, the similarities of BCBS 239 and GDPR requirements are obvious.
Conclusions:
- both regulations require data lineage;
- in order to be compliant, you can choose different documentation formats and tooling;
- many more developments are required.
References
[1] DAMA DM-BOK Body of Knowledge, First edition
[2] DAMA DM-BOK Body of Knowledge, Second edition
[3] DAMA Dictionary 2nd Edition 2011
For more insights, visit the Data Crossroads Academy site: //academy.datacrossroads.nl/courses/data-lineage-what-why-how/lesson/data-lineage-what-why-how/