In the previous articles of this series, we have discussed how to build a company-specific data management maturity assessment and the way to benchmark the results for Data Quality. Now, it is time to take a look at the maturity of a data chain sub-capability.
In this article, I will share an in-depth approach for measuring and benchmarking the maturity level of the data chain sub-capability. Benchmark results used in this article have been based on ‘Data Management Maturity Review 2019.’
We will cover the following four topics:
- Definition of the ‘data chain’ sub-capability and its dimensions
- Specification of indicators (KPIs) for measuring the performance
- Benchmarking results based on a set of indicators
- Development tips
Data and information value chain (data chain) sub-capability and its dimensions
Data and information value chain (data chain) is one of the five sub-capabilities of the ‘Orange’ model of data management. Therefore, the maturity of the data chain sub-capability is important for the overall maturity. The explanation of the model you find in Data Management Maturity 101: What is a data management maturity assessment and why does a company need it? The overview of the model is shown in Figure 1.
‘Data and information value chain is a set of actions that transform raw data into meaningful Information’ as stated in ‘Data Management Maturity Review 2019’.
The following dimensions enable a (sub) capability: role, process, data (input and output), and tools.
In Figure 2, each dimension of the data chain sub-capability is described in detail.
Figure 2. A detailed description of the data chain sub-capability dimensions.
In our context, ‘data’ stands for formal deliverables/artifacts of the data management sub-capability. The key deliverables of this sub-capability are related to the rules and roles that ensure the operation of the data management function.
There are several key deliverables of this sub-capability.
As I have discussed in the series of my articles about Data lineage, the concept of data chain is not clearly defined and its meaning is not aligned within the data management community. There are at least six concepts that intersect each other: data chain, data value chain, data flow, data lineage, integration architecture, information value chain. Each company should specify its understanding of what the data chain is, the key components that it is composed of, the level(s) of the data model used to document it, and the way it is documented. Therefore, the first deliverable will be a metamodel of the data chain, which should be applicable and feasible for the company.
As soon as the metamodel is specified, the data chain documentation will be made. It is advisable to create a catalog of data chains as a first step. The concept of critical data elements will assist in the prioritization of the documentation of data chains. After the scope of the initiative is limited to a reasonable level, the documentation of the data chain(s) will take place. The documentation of data lineage can be done either manually, or be automated, or you can choose a combination of both. The deliverables of all other sub-capabilities are required to connect them in the description of the data chain.
‘Process’ signifies a data management-related business process at different levels of abstraction.
The documentation of the data chain is a process that will involve the efforts of different professionals from multiple disciplines. The process of documentation consists of several tasks such as the design, analysis, optimization, and documentation of the data chain. The important component of the process is the coordination of the activities of the multi-disciplinary teams.
‘Role’ describes the participation of people in business operations. It can represent business units, functional jobs, a set of data management-related accountabilities and responsibilities (in RACI context), etc.
A set of roles required to perform this capability depends on the definition and components chosen to document the data chain. Furthermore, the majority of artifacts produced by other sub-capabilities will be required for the assembling of the data chain. Therefore, all data management roles involved in the delivery of these artifacts will be also involved in the data chain-related activities.
The following data management professionals might be involved in the documentation of the data chain: data- and application architects, data modelers, data analysts. Subject matter experts (SMEs) from businesses will be also involved. IT professionals will be represented by database- and solution architects, designers, and engineers. In the case of the automated solution, the skills of IT consultants and developers for the automated data chain solution will be required.
‘Tools’ include information technology systems and applications as well as resources required for performing the data management function, e.g. budget.
All tools involved in the documentation of the artifacts of all other data management sub-capabilities are also relevant for the data chain capability.
Data models are kept in data modeling tools. Business processes are documented in BPM tools. Business rules and ETLs should be documented in some repositories. The metadata repository is the key source of information for the data chain. Data chain tools will depend on the chosen way to document the data chain. In the case of a manual solution, a company has a great range of choices starting from MS applications (Excel, Visio, PowerPoint) to Axon by Informatica, Collibra, Solidatus. Those companies who will choose the automated way of documentation also will have a great choice between ERWIN, Solidatus, Octopai, Collibra, SAS, and Informatica solutions.
Specification of indicators (KPIs) to measure the performance
Each of the sub-capability dimensions described above can serve as a specific indicator (KPI) to measure performance.
By assigning maturity levels to chosen indicators each company can create its maturity assessment.
I will demonstrate four indicators as examples. These indicators have been used as the foundation of our Data Management Maturity Scan:
Indicator 1 (Tools) : The availability of an integrated tool to document data lineage
Data flows through the whole company and touches different departments. The documentation of the data chain is the effort of multi-disciplinary stakeholders from different departments across the whole enterprise. Therefore, the ability to share the information is of very importance.
Indicator 2 (Process) : Ability to deliver new data
Due to new regulations, updated information requirements come out quickly. The discovery and delivery of the corresponding new data remain a big issue for a lot of companies. The more data sources a company has the bigger the issue. The ability of a company to quickly react to new information & data requirements is one of the indicators of its maturity.
Indicator 3 (Process) : Ability to explain data transformation
Regulatory bodies and audit functions come up very often with the requirements to explain the origin of data and the transformation it has undergone. This is one of the most challenging tasks for a lot of companies. The business value of the documentation of the data chain will be proved best if information about data transformations is accessible and transparent.
Indicator 4 (Process) : Level of coordination between different stakeholders
As already stated above, the documentation of the data chain requires a coordinated effort of different professionals across the whole organization. The data management function is accountable for the coordination of the activities of all data stakeholders. Therefore, the level of coordination demonstrates the level of the maturity of the whole data management function.
Below you will find the benchmarking results for the four above-mentioned indicators (KPIs). You can use these four indicators to quickly benchmark the situation in your company against.
Each of the indicators has been evaluated at one of five maturity levels, that demonstrate the level of development.
The results presented in Figure 3, have demonstrated the data management maturity for the data chain sub-capability. These results have led us to the following conclusions:
- The majority of the companies put a lot of effort into the documentation of the data chain.
More than 70% of respondents have recognized the necessity of this activity.
- Yet, the ability of companies to discover and deliver new data remains a challenge for almost 40% of respondents.
- The situation with the ability to explain data transformation seems to be even worse. Almost 50% can hardly do it. Only25% of respondents have been creating a foundation for doing it.
- The coordination of the activities of different stakeholders daily remains a challenge for almost 50% of respondents. Only 17 % of respondents are confident about a good level of coordination of stakeholders.
To improve the situation with the data chains capability companies should:
…align the processes of documenting information requirements and finding relevant data sources on the most granular levels;
…investigate and document application and data flows;
…apply the data lineage methodology for the documentation of the critical data chains.