Recently, the concept of critical data elements has caught the attention of many data management professionals. I was no exception, so I decided to dive deeper into this subject and do some research. In this article, I would like to share some results of my research and my experience with:
- …definitions of critical data and critical data elements (CDEs)
- …reasons to use CDEs
- …key challenges with CDEs in practical implementation.
The definition of critical data
As a starting point of my research, I decided to consult the leading data management guides and legislation documentation to see what they had to say about critical data elements.
The concept of critical data has appeared in the second edition of DAMA-DMBOK (DAMA-DMBOK 2) by DAMA International in the topics related to the Data Quality Knowledge Area1. DAMA-DMBOK2 provides only general characteristics of critical data. Critical data is specified by its usage: regulatory reporting, financial reporting, business policy, ongoing operations, business strategy’2. DAMA-DMBOK2 also stresses that ‘specific drivers for criticality will differ by industry’ 3. This was it, so it seems that if you want to go beyond these definitions, you should develop the concept of critical data yourself.
The critical data concept has also been introduced in The Basel Committee on Banking Supervision‘s standard number 239: “Principles for effective risk data aggregation and risk reporting” (BCBS 239 or PERDARR). BCBS239 speaks about critical data in the following contexts:
- ‘data that is critical to enabling the bank to manage the risks it faces’4
- ‘data critical to risk data aggregation and IT infrastructure initiative’5
- ‘aggregated information to make critical decisions about risk’6.
To summarise, here, data and information are considered critical with respect to managing business risks.
After consulting these guidelines and regulations, my conclusion was that the concept of critical data is not yet defined or aligned in various sources. For the purpose of this article, we might keep in mind the following:
- critical data influences a company’s management decisions and performance, both financial and non-financial
- the criteria of criticality should develop by each company separately.
Now let’s discuss the business value of implementing the critical data elements concept.
Reasons to use CDEs
The key reason to use the concept of CDEs in your practice is to limit the scope of your data management initiatives to the feasible minimum.
Assume that your key data management driver is in compliance with a regulation. If you focus on compliance with The EU General Data Protection Regulation, you will deal only with personal data. If you deal with BCBS 239, you will limit your data to that which is related to risk reporting. Still, the scope of the definition of ‘risk’ data is vast. Therefore, you should only focus on those risk metrics or KPIs your company uses to manage business risks. The same applies to financial data, which constitutes almost 80% of data circulating within a company.
Even though the definition of CDE is not very well aligned, in theory, everything seems to be relatively clear… Until you start implementing the concept in practice. Then it can become somewhat challenging. Let’s take a closer look at the main challenges and how to deal with them.
Key challenges in the practical implementation of CDEs
- How to define CDEs.
To resolve this challenge, let’s look back at our first conclusion: critical data elements influence company performance, both financial and non-financial. The easiest way to define your company’s CDEs can be described in several steps:
- Identify your key driver for the current data management initiative.
Think about compliance with regulations, improvement of customer experience, optimization of decision-making, and so on. Each business driver will require a specific set of data and/or information. For example, for the improvement of customer experience, you will mainly focus on customer data.
- Specify your key (critical) reports and information.
Despite a lot of talk about digitalization, major companies still rely on different reports when it comes to decision-making. Reports are simply containers of information. What you could do, is list your reports and choose the most critical ones. Such an analysis will help optimize information delivery in your company.
- Define your critical data elements.
Once the critical reports are specified, you should start analyzing the critical data elements. They usually reside within reports in the form of KPIs or metrics. You might count 50-100 such critical data elements.
- Minimize the number of critical data elements.
You can minimize the number of critical data elements by involving subject matter experts. You might ask me now: why must we minimize the number of CDEs? This is your second challenge: what to do with the CDEs?
- What are you going to do with CDEs?
We have specified that CDEs are data elements that have the most significant influence on decision-making and company performance. It means that the value of the CDEs depends on their reliability. Therefore, you need to ensure that the calculation of these CDEs is based on correct data and is calculated correctly. What we face here are basic data quality challenges. Your key goal is to check and prove the reliability of the KPIs or metrics that you specify as critical.
- Recognize ultimate and transitional CDEs and different criteria of their criticality
So far, we have talked about how CDEs usually reside in reports in the form of KPIs or metrics. I call them the ‘ultimate’ CDEs because they are located at the final point of the data/information processing path. Such a path is called ‘data lineage’ or ‘data/ information value chain’. I consider the information value chain a set of business capabilities that enable the transformation of raw data into meaningful information to enhance decision-making at different organizational levels in the company. In this respect, data lineage is the way to document or record the information value chain. If we need to ensure critical data elements being of the required quality, we need to be able to perform root-cause analysis and investigate the whole chain of data being processed and used to derive the specified ultimate CDEs. All data elements that are involved in the calculation of the ultimate CDEs I call the ‘transitional’ ones. The criteria of criticality for the transitional CDEs is their impact on the calculation results of the ultimate ones. The illustration of the concept of ‘ultimate’ and ‘transitional’ CDEs is shown in Figure 1:
If you take a look at the illustrated relationships between the ‘ultimate’ and the ‘transitional’ CDEs you will understand that this is a visualization of data lineage. And this brings you to the next challenge: to ensure ultimate CDEs are trustful and auditable, you need to have the whole data lineage in place.
- Data lineage and CDEs: the ‘chicken or the eggs’ dilemma
We just faced the most critical challenge: data lineage is a prerequisite to managing CDEs!! I hope that you are familiar with the data lineage concept. If you need to refresh your knowledge, I can refer to the set of articles I just published on the subject (Data Lineage 101, 102, 103, 104 & 105)
The reality is that not many companies now have data lineage in place. So, the situation reminds the well-known ‘the chicken or the egg’ dilemma. To manage CDEs, you need to have data lineage, but to be able to document data lineage; you need to have CDEs to limit the scope.
What should you do in such a situation? One of the practical tips sounds as follows. If you know your sourcing data elements, you can use experts to specify the most critical ones and hope that the specified CDEs really make the biggest influence on the calculation results. Such an approach does not exclude the necessity to make attempts to document data lineage. The last challenge that relates to both data lineage and CDEs concept is: on which level of data models should you specify CDEs?
- Specify the level of data models to document CDEs and data lineage
Dealing with this challenge, you will probably arrive at the conclusion that the level to document the ‘ultimate’ and the ‘transitional’ data elements will differ.
‘Ultimate’ data elements that are very often KPIs or metrics in reports will need to be specified on conceptual or logical levels of data models. To be able to explain how the ‘transitional’ data elements are being processed to derive the ‘ultimate’ ones, you will need to document them on the application logical of physical level. The approach is illustrated in Figure 2:
There are some other questions still to be answered. For example, who is responsible for documenting ultimate and transitional CDEs? I will come back to the topic of data management-related roles in my future articles.
I hope that by now, you are reasonably equipped to continue with the practical implementation of the critical data elements concept.
Those, who are interested to know more about the application of the concept of CDEs and information value chain, can consult my new book The Data Management Toolkit. You can download the first chapter for free HERE, or purchase it on Amazon HERE.
For more insights, visit the Data Crossroads Academy site: //academy.datacrossroads.nl
————————————————————————————————————————-
References
- DAMA International. DAMA-DMBOK: Data Management Body of Knowledge, Second Edition. Bradley Beach, N.J.: Technics Publications, 2017, p.454.
- DAMA International. DAMA-DMBOK: Data Management Body of Knowledge, Second Edition. Bradley Beach, N.J.: Technics Publications, 2017, p.454.
- DAMA International. DAMA-DMBOK: Data Management Body of Knowledge, Second Edition. Bradley Beach, N.J.: Technics Publications, 2017, p.454.
- The Basel Committee on Banking Supervision‘s standard number 239: “Principles for effective risk data aggregation and risk reporting” (BCBS 239 or PERDARR), par.16.
- The Basel Committee on Banking Supervision‘s standard number 239: “Principles for effective risk data aggregation and risk reporting” (BCBS 239 or PERDARR), par.30.
- The Basel Committee on Banking Supervision‘s standard number 239: “Principles for effective risk data aggregation and risk reporting” (BCBS 239 or PERDARR), par.52.
- The Basel Committee on Banking Supervision‘s standard number 239: “Principles for effective risk data aggregation and risk reporting” (BCBS 239 or PERDARR), par.23.
[/vc_column_text][ultimate_spacer height=”30″]
[/vc_column][/vc_row]