This article demonstrates how to map data lineage legislative requirements with a data lineage model.
In ‘Data Lineage 102: Definition and key components’, we have aligned the definition of data lineage and specified its key components from data management’s viewpoint. Now it is time to look at the requirements from the legislative viewpoint.
BSBC239 (PERDARR) and GDPR as key legislative triggers for data lineage
As an example, we will take the requirements of the Basel Committee on Banking Supervision’s standard number 239, “Principles for effective risk data aggregation and risk reporting” (BCBS239 or PERDARR)1and the EU General Data Protection Regulation (GDPR)2.
The strangest thing is that you will never find the term’ data lineage’ literally mentioned in these regulatory documents.
What all data management professionals in the world have to do is investigate the requirements and translate them into the data management language. Let’s do the same with the 2 legislations mentioned above.
Key data lineage components from a data management viewpoint
As I have discussed in ‘Data Lineage 102’, data lineage consists of the following interlinked components, also shown in Figure 1:
- IT systems (application, database, network segment)
Data flows through the chain of systems or applications in which data is being transformed and integrated.
‘Golden sources’ and reports/ dashboards are two boundaries that denote correspondingly the point where data is created and its final destination.
- Business process
Business processes ensure a set of activities related to data processing. Business processes usually include references to related applications.
- Data (elements) themselves form the key component of data lineage. Data (elements) can be specified at different levels of abstraction and details. Usually, you do it at one of the following data model levels:
- Conceptual: data elements are presented in the form of terms and related constraints.
- Logical, application-related: data entities & attributes of a specific database and related data transformation rules.
- Logical, not application related: data entities & attributes and related data transformation rules.
- Physical: tables & columns & related ETLs (Extract, Transform, Load).
Usually, you would link data elements on different levels of data models. Such a link is sometimes called ‘vertical data lineage’ as opposed to ‘horizontal data lineage,’ which represents the path of data from the point of origination to the point of usage. DAMA-DMBOK2 mentions the term’ linkage’3 between different data model levels. In any case, physical data models are always linked to a specific application.
- Data checks and controls.
In the definition of data lineage specified by Enterprise Data Management, ‘lineage may include a mapping of the data controls’4.
Now let’s plot the BCBS239 (PERDARR) and GDPR requirements to the scheme of data lineage (Figure 1):
Key data lineage components from the legislative viewpoint
There are specific requirements in the legislation that you can interpret as components of data lineage, see Figure 2 :
- Information/reports
BCBS239 stresses the necessity that ‘the right information needs to be presented to the right people at the right time’5, followed by requirements to ‘distribute risk reports to the relevant parties’6. Figure 2 mentions this component as ‘Dashboards/ Reports.’
- Business process
BCBS239 also specifies that it is necessary ‘to document and explain all of their risk data aggregation processes whether automated or manual’7.
- Business dictionary
BCBS239 draws organizations’ attention to a business dictionary, the concepts used in a report to define data consistently across the organization’8.
From a data management perspective on data lineage, a business dictionary, a set of business terms, corresponds to the data models’ conceptual level.
- Data elements and business rules at the logical level
BCBS239 identifies the requirement to maintain ‘inventory and classification of risk data items’9, which you could translate as data elements at the logical level of data models. In addition, ‘automated and manual edit and reasonableness checks, including an inventory of the validation rules applied to quantitative information’10, are also required. ‘The inventory should include explanations of the conventions used to describe any mathematical or logical relationships that should be verified through these validations or checks’11. In the language of data management, it is interpreted as a repository of business rules.
- Application Landscape
One of the BCBS239 principles states that a bank should design, build and maintain data architecture and IT infrastructure which fully supports its risk data aggregation capabilities and risk reporting practices’12.
GDPR requires that a company should ‘implement appropriate technical and organizational measures to ensure and to be able to demonstrate that processing is performed following this Regulation’13. Several articles on GDPR, i.e., 24,25, 32, focus on the necessity of appropriate technical and organizational measures to ensure proper personal data processing.
Even if there is no direct requirement to document data flow through applications, every data management professional still ‘translates’ these requirements as such.
- Business and technical metadata
Metadata is one of the crucial components of data lineage. Metadata describes all other data types, including all other components of the data lineage mentioned above.
BCBS239 stresses the necessity to record business metadata, i.e., in the form of ‘ownership of risk data and information for both the Business and IT function’ 14. It also recommends documenting ‘integrated data taxonomies and architecture […], which includes information on the characteristics of the data (metadata) and the use of single identifiers and/or unified naming conventions for data including legal entities, counterparties, customers, and accounts’15. This last requirement is obviously related to both business and technical metadata.
GDPR has extended requirements for recording personal information, such as requirements that ‘each controller […] shall maintain a record of processing activities under its responsibility. That record shall contain all of the following information: (b) the purposes of the processing; (c) a description of the categories of data subjects and the categories of personal data; d) the categories of recipients to whom the personal data have been or will be disclosed including recipients in third countries or international organizations; (e)where applicable, transfers of personal data to a third country or an international organization, including the identification of that third country or international organization ; (f) where possible, the envisaged time limits for erasure of the different categories of data; (g) where possible, a general description of the technical and organizational security measures .’16. Everything mentioned in this Article 30 of GDPR can be recognized as business metadata.
Furthermore, a company needs knowledge of metadata and data lineage capabilities to ensure the exercise of some rights of the data subject. Think, for example, about such rights of the data subject as the ‘right to obtain from the controller the erasure of personal data concerning him or her’17, ‘right to obtain from the controller restriction of processing’18, ‘the right to receive the personal data […] in a structured, commonly used and machine-readable format and […] the right to transmit those data to another controller’19. Knowing how data flows through applications on the physical level seems to be an unavoidable condition.
- Data (quality) controls
BCBS239 (PERDARR) is rather direct about the necessity to ‘measure and monitor the accuracy of data’20. It stresses that ‘Banks must produce aggregated risk data that is complete and measure and monitor the completeness of their risk data’21 and ‘controls surrounding risk data should be as robust as those applicable to accounting data’22. ‘Integrated procedures for identifying, reporting, and explaining data errors or weaknesses in data integrity via exceptions reports’23 are to be in place.
GDPR instead focuses on technical and organizational measures to ensure a level of security appropriate to the risk’24 related to personal data processing.
After the consideration of the requirements of two pieces of legislation, I have come to the following definition of components of data flow/ lineage that your company should document and maintain:
- Report (catalog)
- Application flow
- The conceptual level of data model: terms and business dictionary
- Logical level of data model: data entities and repository of related business/validation rules
- The physical level of data model: database schemes and ETLs repository
- Business processes
- Data (quality) checks and controls.
By now, you already know what components of data lineage you should document. Your next question is how you will do it. This is a topic for Data Lineage 104.
For more insights, visit the Data Crossroads Academy site:
————————————————————————————————————-
References
- BCBS239
- Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons about the processing of personal data and the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation).
- DAMA International. DAMA-DMBOK: Data Management Body of Knowledge, Second Edition. Bradley Beach, N.J.: Technics Publications, 2017, p.105.
- Enterprise Data Management Council. The Standard Glossary of Data Management Concepts, version 0.2.1, 2017, p.9.
- BCBS239, par.51.
- BCBS239, Principle 11.
- BCBS239, Principle 3, par.39.
- BCBS239, Principle 6, par.37.
- BCBS239, Principle 8, par.67
- BCBS239, Principle 7, par.53b.
- BCBS239, Principle 7, par.53b.
- BCBS239, Principle 2.
- Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation), Art.24.
- BCBS239, Principle 2, par.34.
- BCBS239, Principle 2, par.33.
- Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (GeneralData Protection Regulation), Art.30.
- Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation), Art.17.
- Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation), Art.18.
- Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation), Art.20.
- BCBS239, Principle 3, par.40.
- BCBS239, Principle 4, par.43.
- BCBS239, Principle 3, par.36a.
- BCBS239,Principle 7, par.53c.
- Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation), Art.32