Recently, the concept of critical data has caught the attention of a lot of data management professionals. I myself was no exception, so I decided to dive deeper into this subject and do some research. In this article, I would like to share some results of my research and my experience with:

  • …definitions of critical data and critical data elements (CDEs)
  • …reasons to use CDEs
  • …key challenges with CDEs in practical implementation.
The definition of critical data

As a starting point of my research, I decided to consult the leading data management guides and legislation documentation to see what they had to say about critical data (elements).

The concept of critical data has appeared in the second edition of DAMA-DMBOK (DAMA-DMBOK 2) by DAMA International in the topics related to the Data Quality Knowledge Area1 . DAMA-DMBOK2 provides only general characteristics of critical data. Critical data is specified by its usage, which is ‘regulatory reporting, financial reporting, business policy, ongoing operations, business strategy’2. DAMA-DMBOK2 also stresses that ‘specific drivers for criticality will differ by industry’ 3. This was it, so it seems that if you want to go beyond these definitions, you should develop the concept of the critical data yourself.

The critical data concept has also been introduced in The Basel Committee on Banking Supervision‘s standard number 239: “Principles for effective risk data aggregation and risk reporting” (BCBS 239 or PERDARR). BCBS239 speaks about critical data in the following contexts:

  1. ‘data that is critical to enabling the bank to manage the risks it faces’4
  2. ‘data critical to risk data aggregation and IT infrastructure initiative’5
  3. ‘aggregated information to make critical decisions about risk’6.

To summarise, here, data and information are considered being critical in respect to managing business risks.

After consulting these guidelines and regulations, my conclusion was that the concept of critical data is not yet defined or aligned in various sources. For the purpose of this article, we might keep in mind the following:

  • critical data influences company’s management decision and performance, both financial and non-financial
  • the criteria of criticality should develop by each company separtely.

Now let’s talk about the business value of implementing the critical data elements concept.

Reasons to use CDEs

The key reason to use the concept of CDEs in your practice is to limit the scope of your data management initiatives to the feasible minimum.

Assume that your key data management driver is compliancy with a regulation. If you focus on compliancy with The EU General Data Protection Regulation, you will deal only with personal data. If you deal with BCBS 239, you will limit your data to that which is related to risk reporting. Still, the scope of the definition of ‘risk’ data is very wide. Therefore, you should only focus on only those risk metrics or KPIs that your company is using to manage business risks. The same applies to to financial data which constitutes almost 80% of data circulating within a company.

Even though the definition of CDE is not very well aligned, in theory, everything seems to be rather clear… Until you start implementing the concept in practice. Then it can become rather challenging. Let’s take a closer look at what the main challenges are and how to deal with them.

Key challenges in practical implementation of CDEs
  1. How to define CDEs.

For the resolution of this challenge, let’s look back at our first conclusion: critical data elements are those that influence company performance, both financial and non-financial. The easiest way to define your company’s CDEs can be described in several steps:

  1. Identify your key driver for the current data management initiative.

Think about compliancy with regulations, improvement of customer experience, optimization of decision making and so on. Each business driver will require a specific set of data and/or information. For example, for improvement of customer experience, you will mainly focus on customer data.

  1. Specify your key (critical) reports and information.

Despite a lot of talk about digitalization, when it comes to decision making, major companies still rely on different reports. Reports are simply containers of information. What you could do, is list your reports and choose the most critical ones. Such an analysis will help optimize information delivery in your company.

  1. Define your critical data elements

Once the critical reports are specified, you should start analysing the critical data elements. They usually reside within reports in the form of KPIs or metrics. You might count 50-100 such critical data elements.

  1. Minimize the number of critical data elements

You can minimize the number of critical data elements by involving subject matter experts. You might ask me now: why do we need to minimize the number of CDEs? This is the second challenge you deal with: what to do with the CDEs?

 

  1. What are you going to do with CDEs?

We have specified that CDEs are data elements that have the biggest influence on decision making and company performance. It means that the value of the CDEs depends on their reliability. Therefore, you need to ensure that the calculation of these CDEs is based on correct data and is calculated correctly. What we face here are basic data quality challenges. Your key goal is to check and prove the reliability of the KPIs or metrics that you specify as critical.

  1. Recognise ultimate and transitional CDEs and different criteria of their criticality

So far, we have talked about how CDEs usually reside in reports in the form of KPIs or metrics. I call them the ‘ultimate’ CDEs because they are located at the final point of data / information processing path. Such a path is called ‘data lineage’ or ‘data/ information value chain’. I consider information value chain a set of business capabilities that enable transformation of raw data into meaningful information to enhance decision making at different organizational level in the company. In this respect, data lineage is the way to document or record thekil information value chain. If we need to ensure critical data elements being of required quality, we need to be able to perform root-cause analysis and investigate the whole chain of data being processed and used to derive the specified ultimate CDEs. All data elements that are involved in the calculation of the ultimate CDEs I call the ‘transitional’ ones. The criteria of criticality for the transitional CDEs is their impact on the calculation results of the ultimate ones. The illustration of the concept of ‘ultimate’ and ‘transitional’ CDEs is shown in Figure 1:

Figure 1. The concept of the ‘ultimate’ and ‘transitional’ CDEs.

If you take a look at the illustrated relationships between the ‘ultimate’ and the ‘transitional’ CDEs you will understand that this is visualization of data lineage. And this brings you to the next challenge: to ensure ultimate CDEs are trustful and auditable, you need to have the whole data lineage in place.

  1. Data lineage and CDEs: the ‘chicken or the egg’ dilemma

We just faced the most critical challenge: data lineage is a prerequisite to manage CDEs!! I hope that you are familiar with the data lineage concept. If you need to refresh your knowledge, I can refer to the set of articles I just published on the subject (Data Lineage 101, 102, 103, 104 & 105)

The reality is that not many companies have now data lineage in place. So, the situation reminds the well-known ‘the chicken or the egg’ dilemma. To manage CDEs you need to have data lineage, but to be able to document data lineage, you need to have CDEs to limit the scope.

What should you do in such a situation? One of practical tips sounds as following. If you know your sourcing data elements, you can use experts to specify the most critical ones and hope that the specified CDEs really make the biggest influence on the calculation results. Such an approach does not exclude the necessity to make attempts to document data lineage. The last challenge that relates to both data lineage and CDEs concept, is: on which level of data models should you specify CDEs?

  1. Specify the level of data models to document CDEs and data lineage

Dealing with this challenge, you will probably arrive to the conclusion that the level to document the ‘ultimate’ and the ‘transitional’ data elements will differ.

‘Ultimate’ data elements that are very often KPIs or metrics in reports will need to be specified on conceptual or logical levels of data models. To be able to explain how the ‘transitional’ data elements are being processed to derive the ‘ultimate’ ones, you will need to document them on application logical of physical level. The approach is illustrated in Figure 2:

Figure 2: Data model levels to document ultimate and transitional CDEs.

There are some other questions still to be answered such for example, who is responsible for documenting ultimate and transitional CDEs. I will come back to the topic of data management related roles in my future articles.

I hope that by now you are reasonable equipped to continue with practical implementation of critical data elements concept.

For those, who are interested to know more about the application of the concept of CDEs and information value chain, can consult my new book The Data Management Toolkit. You can download the first chapter for free HERE, or purchase it on Amazon HERE.

 

————————————————————————————————————————-

References

  1. DAMA International. DAMA-DMBOK: Data Management Body of Knowledge, Second Edition. Bradley Beach, N.J.: Technics Publications, 2017, p.454.
  2. DAMA International. DAMA-DMBOK: Data Management Body of Knowledge, Second Edition. Bradley Beach, N.J.: Technics Publications, 2017, p.454.
  3. DAMA International. DAMA-DMBOK: Data Management Body of Knowledge, Second Edition. Bradley Beach, N.J.: Technics Publications, 2017, p.454.
  4. The Basel Committee on Banking Supervision‘s standard number 239: “Principles for effective risk data aggregation and risk reporting” (BCBS 239 or PERDARR), par.16.
  5. The Basel Committee on Banking Supervision‘s standard number 239: “Principles for effective risk data aggregation and risk reporting” (BCBS 239 or PERDARR), par.30.
  6. The Basel Committee on Banking Supervision‘s standard number 239: “Principles for effective risk data aggregation and risk reporting” (BCBS 239 or PERDARR), par.52.
  7. The Basel Committee on Banking Supervision‘s standard number 239: “Principles for effective risk data aggregation and risk reporting” (BCBS 239 or PERDARR), par.23.
[ultimate_spacer height=”30″]