In ‘Data Lineage 102: Definition and key components’ we have aligned the definition of data lineage and specified its key components from the viewpoint of data management. Now it is time to take a look at the requirements from the viewpoint of legislation.

BSBC239 (PERDARR) and GDPR as key legislative triggers for data lineage

As an example, we will take requirements of the Basel Committee on Banking Supervision‘s standard number 239 “Principles for effective risk data aggregation and risk reporting” (BCBS239 or PERDARR)1and the EU General Data Protection Regulation (GDPR)2.

The strangest thing is that you will never find the term ‘data lineage’ literally mentioned in these regulatory documents.

So what all data management professionals in the world have to do: they need to investigate the requirements and translate them into the data management language. Let’s do the same with 2 above mentioned legislations.

Key data lineage components from data management viewpoint

As I have discussed in ‘Data Lineage 102’, data lineage consists of the following interlinked components, also shown in Figure 1:

  • IT systems (application, database, network segment)

Data flows through the chain of systems or applications in which data is being transformed and integrated.

‘Golden sources’ and reports/ dashboards are two boundaries that denote correspondingly the point where data is created and its final destination.

  • Business process

Business processes ensure a set of activities related to data processing. Business processes usually include references to related applications.

  • Data (elements) themselves forms the key component of data lineage. Data (elements) can be specified at different levels of abstraction and details. Usually, you do it at one of the following data model levels:
    • Conceptual: data elements are presented in the form of terms and related constrains.
    • Logical, application related: data entities & attributes of a specific database and related data transformation rules.
    • Logical, not application related: data entities & attributes and related data transformation rules.
    • Physical: tables & columns & related ETLs (Extract, Transform, Load).

Usually, you would link data elements on different levels of data models. Such a link is sometimes called ‘vertical data lineage’ as opposed to ‘horizontal data lineage’ that represents the path of data from the point of origination to the point of usage.  DAMA-DMBOK2 mentions the term ‘linkage’3 between different data model levels. In any case, physical data models are always linked to a specific application.

  • Data checks and controls.

In the definition of data lineage specified by Enterprise Data Management, ‘lineage may include a mapping of the data controls’4.

Now let’s to plot the BCBS239 (PERDARR) and GDPR requirements to the scheme of data lineage (Figure 1):

Key components of data lineage from the perspective of data management.

Figure 1. Key components of data lineage from the perspective of data management.

Key data lineage components from legislative viewpoint

There are certain requirements in the legislation that you can interpret as components of data lineage, see Figure 2 :

Legislative requirements in relation to data lineage.

Figure 2. Legislative requirements in relation to data lineage.

  1. Information / reports

BCBS239 stresses the necessity that ‘the right information needs to be presented to the right people at the right time’5, followed by requirements to ‘distribute risk reports to the relevant parties’6.  In Figure 2, this component is mentioned as ‘Dashboards/ Reports’.

  1. Business process

BCBS239 also specifies that it is necessary to ‘to document and explain all of their risk data aggregation processes whether automated or manual’7.

  1. Business dictionary

BCBS239 draws attention of organizations to a business dictionary, which is ‘the concepts used in a report such that data is defined consistently across the organization’8.

From data management perspective on data lineage, business dictionary, which is the set of business terms, corresponds to the conceptual level of data models.

  1. Data elements and business rules at logical level

BCBS239 points out the requirement to maintain ‘inventory and classification of risk data items’9, which you could translate as data elements at logical level of data models. In addition to that, ‘automated and manual edit and reasonableness checks, including an inventory of the validation rules that are applied to quantitative information’10 are also required. ‘The inventory should include explanations of the conventions used to describe any mathematical or logical relationships that should be verified through these validations or checks’11. In the language of data management, it is interpreted as a repository of business rules.

  1. Application landscape

One of the BCBS239 principles states that ‘a bank should design, build and maintain data architecture and IT infrastructure which fully supports its risk data aggregation capabilities and risk reporting practices’12.

GDPR requires that a company should ‘implement appropriate technical and organizational measures to ensure and to be able to demonstrate that processing is performed in accordance with this Regulation’13. There are several articles in GDPR, i.e. 24,25, 32  that focus on necessity of appropriate technical and organizational measures to ensure proper processing of personal data.

Even if there is no direct requirement to document data flow through applications, every data management professional still ‘translates’ these requirements as such.

  1. Business and technical metadata

Metadata is one of the crucial components of data lineage. Metadata describes all other types of data, including all other components of data lineage mentioned above.

BCBS239 stresses the necessity to record business metadata, i.e. in the form of ‘ownership of risk data and information for both the Business and IT function’ 14. It also recommends to document ‘integrated data taxonomies and architecture […], which includes information on the characteristics of the data (metadata) as well as use of single identifiers and / or unified naming conventions for data including legal entities, counterparties, customers and accounts’15. This last requirement is obviously related to both business and technical metadata.

GDPR has extended requirements for recording personal information, such as, requirements that ‘each controller […] shall maintain a record of processing activities under its responsibility. That record shall contain all of the following information: (b) the purposes of the processing; (c) a description of the categories of data subjects and of the categories of personal data; d) the categories of recipients to whom the personal data have been or will be disclosed including recipients in third countries or international organizations; (e)where applicable, transfers of personal data to a third country or an international organization, including the identification of that third country or international organization ; (f) where possible, the envisaged time limits for erasure of the different categories of data; (g) where possible, a general description of the technical and organizational security measures .’16. Everything that is mentioned in this Article30  of GDPR you can  recognized as business metadata.

Furthermore, to ensure the exercise of some rights of data subject a company definitely needs knowledge of metadata and data lineage capabilities in place. Think, for example about such rights of data subject as the ‘right to obtain from the controller the erasure of personal data concerning him or her’17, ‘right to obtain from the controller restriction of processing’18, ‘the right to receive the personal data […] in a structured, commonly used and machine-readable format and […] the right to transmit those data to another controller’19.  Knowing how data flows through applications on the physical level seems to be an unavoidable condition.

  1. Data (quality) controls

BCBS239 (PERDARR) is rather direct about the necessity to ‘measure and monitor accuracy of data’20. It stresses that ‘Banks must produce aggregated risk data that is complete and measure and monitor the completeness of their risk data’21 and ‘controls surrounding risk data should be as robust as those applicable to accounting data’22. ‘Integrated procedures for identifying, reporting and explaining data errors or weaknesses in data integrity via exceptions reports’23 are to be in place.

GDPR rather focuses on ’technical and organizational measures to ensure a level of security appropriate to the risk’24 related to processing of personal data.

After the consideration of the requirements of two legislation, I have come to the following definition of components of data flow/ lineage that your company should document and maintain:

  • Report (catalogue)
  • Application flow
  • Conceptual level of data model: terms and business dictionary
  • Logical level of data model: data entities and repository of related business / validation rules
  • Physical level of data model: database schemes and ETLs repository
  • Business processes
  • Data (quality) checks and controls.

By now, you already know what components of data lineage you should document. Your next question is how you will do it. This is a topic for the Data Lineage 104.

————————————————————————————————————-

References

  1. BCBS239
  2. Regulation (EU) 2016/679 of the European parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation).
  3. DAMA International. DAMA-DMBOK: Data Management Body of Knowledge, Second Edition. Bradley Beach, N.J.: Technics Publications, 2017, p.105.
  4. Enterprise Data Management Council. The Standard Glossary of Data Management Concepts, version 0.2.1, 2017, p.9.
  5. BCBS239, par.51.
  6. BCBS239, Principle 11.
  7. BCBS239, Principle 3, par.39.
  8. BCBS239, Principle 6, par.37.
  9. BCBS239, Principle 8, par.67
  10. BCBS239,Principle 7, par.53b.
  11. BCBS239, Principle 7, par.53b.
  12. BCBS239, Principle 2.
  13. Regulation (EU) 2016/679 of the European parliament and of the Council of 27 April2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation), Art.24.
  14. BCBS239, Principle 2, par.34.
  15. BCBS239, Principle 2, par.33.
  16. Regulation (EU) 2016/679 of the European parliament and of the Council of 27 April2016 on the protection of natural persons with regard to the processing of personaldata and on the free movement of such data, and repealing Directive 95/46/EC (GeneralData Protection Regulation), Art.30.
  17. Regulation (EU) 2016/679 of the European parliament and of the Council of 27 April2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation), Art.17.
  18. Regulation (EU) 2016/679 of the European parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation), Art.18.
  19. Regulation (EU) 2016/679 of the European parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation), Art.20.
  20. BCBS239, Principle 3, par.40.
  21. BCBS239, Principle 4, par.43.
  22. BCBS239, Principle 3, par.36a.
  23. BCBS239,Principle 7, par.53c.
  24. Regulation (EU) 2016/679 of the European parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation), Art.32