Are you interested to compare data quality maturity in your company with peers in the industry?
In the previous articles of this series, we have discussed how to build a company-specific data management maturity assessment and the way to benchmark the results for Information Systems Architecture.
In this article, I will share an in-depth approach for measuring and benchmarking the maturity level of the data quality sub-capability. Benchmark results used in this article have been based on ‘Data Management Maturity Review 2019.’
We will cover the following four topics:
- Definition of the ‘data quality’ sub-capability and its dimensions
- Specification of indicators (KPIs) for measuring the performance
- Benchmarking results based on a set of indicators
- Development tips
Data Quality sub-capability and its dimensions
Data Quality is one of the five sub-capabilities of the ‘Orange’ model of data management that is explained in Data Management Maturity 101: What is a data management maturity assessment and why does a company need it?. The overview of the model is shown in Figure 1.
‘Data quality is a business capability that enables the delivery of data and information of the required quality’ as specified in ‘Data Management Maturity Review 2019’.
The following dimensions enable a (sub) capability: role, process, data (input and output), and tools.
In Figure 2, each dimension of the data quality sub-capability is described in detail.
Figure 2. A detailed description of the data quality dimensions.
In our context, ‘data’ stands for formal deliverables/artifacts of the data management sub-capability. The key deliverables of this sub-capability are related to the rules and roles that ensure the operation of the data management function.
There three key groups of the deliverables of data quality sub-capability:
- Information and data quality requirements
You can never identify whether some data has data quality issues without the prior identification of requirements. Assume that information about customer addresses is available for 80% of the data set. Is it a DQ issue or not? Unless the requirement of 100% completeness of the value exists, data users cannot complain. Therefore, the definition of data and information requirements is of great importance.
- Resolved data quality issues
There are several deliverables in this group. All of them are related to tasks to perform. You start with identification, conformance, and documentation of data quality issues. Therefore, a company needs to implement and share access to the repository of data quality issues. Then the analysis of the origin of data quality issues has to take place. All artifacts related to such analysis should be documented and archived. To resolve the issues, a list of activities should be undertaken. As a result, you will get the final deliverable: resolved DQ issues.
- DQ checks and controls
To prevent repetitive DQ issues, a set of checks and controls should be developed. These checks and controls will be built according to the DQ requirements. These requirements are to be translated in the form of business rules and be implemented along the data chain.
‘Process’ signifies a data management-related business process at different levels of abstraction.
The processes required for DQ will support the delivery of all artifacts mentioned above: the gathering of DQ requirements, resolution of DQ issues, and documenting and building DQ checks and controls.
‘Role’ describes the participation of people in business operations. It can represent business units, functional jobs, a set of data management-related accountabilities and responsibilities (in RACI context), etc.
As in each of the data management sub-capabilities, different roles will be involved in the performance of tasks. Data management professionals like DQ analysts, solution engineers, and IT- and database engineers and developers will be involved in tasks related to DQ. These tasks are data analysis, and design and implementation of DQ checks and controls. Subject matter experts (SMEs) from the business department will be accountable for the reporting of DQ issues, providing input into the defining of DQ requirements. SMEs will be also involved in the data quality issues analysis should causes of these issues relate to business operations. Management level of SMEs and/or governance bodies will be involved in the resolution of escalated interdisciplinary DQ issues.
‘Tools’ include information technology systems and applications as well as resources required for performing the data management function, e.g. budget.
For DQ sub-capability there are two most important tools to be implemented:
- DQ analytical tools
With the huge amount of data circulating in a company, it becomes impossible to perform a manual analysis of data quality. Data quality profiling tools become very important. Nowadays new solutions based on machine learning techniques prove to be very useful for DQ analysis.
- Repository for DQ business rules and DQ checks and controls
Even large multinational companies either don’t have business rules repositories manually or maintain them in Excel. Excel as the repository tool might work for smaller organizations. It will not work for companies where data transformation crosses many departments. The necessity to align both business rules and DQ checks and controls along the data transformation chain requires a central repository, at least for some part of an enterprise.
Specification of indicators (KPIs) to measure the performance
Each of the sub-capability dimensions described above can serve as a specific indicator (KPI) to measure performance.
By assigning maturity levels to chosen indicators each company can create its maturity assessment.
I will demonstrate four indicators as examples. These indicators have been used as the foundation of our Data Management Maturity Scan:
Indicator 1 (Tools) : Availability of required information for decision-making
One of the key value propositions of data management is the support of decision-making. Decisions are made based on information. Only trustworthy information that is required for decision-making will enable this value proposition. Therefore, keeping control of the actual requirements and measuring the satisfaction of end-users is of high importance.
Indicator 2 (Process) : Information/data delivery according to requirements
To be useful, data and information should be delivered on time to the right place and the right person. Therefore, control of the delivery of data and information according to requirements should be put in place.
Indicator 3 (Data) : Data at the required level of quality
As already discussed, to be able to assess the quality of data, requirements first should be specified. Then the set of DQ checks and controls will ensure that data is delivered at the required level.
Indicator 4 (Role) : Operational DQ checks and controls
DQ checks and controls should be operational and corresponding business processes have to support the performance of DQ checks and controls and the consequent analysis of results.
Below you will find the benchmarking results for the four above-mentioned indicators (KPIs). You can use these four indicators to quickly benchmark the situation in your company against.
Each of the indicators has been evaluated at one of five maturity levels, that demonstrate the level of development.
The results presented in Figure 3, led us to the following conclusions:
- The situation with the availability of required information for decision-making looks rather sad. Almost half of the companies experience a lack of required information. Only 12% of companies have a high level of maturity for the providing of the required information. The rest of the companies are in the process of developing solutions.
- The situation with data delivery according to the requirements seems a little better. 26% of companies have reached their goals. Almost 60% of companies still are not able to deliver data according to the requirements.
- 80% of respondents have not yet reached their goals with the delivery of data of the required quality, of which almost 50% even don’t have formal DQ processes in place.
- 40% of respondents are in the process of developing DQ checks and controls. The results of measurements of indicators 3 and 4 correlate with each other. The relationship is simple: as long as a company has not implemented DQ checks and controls, the quality of data cannot be on the required level.
For many companies, data quality has been the starting point in their data management/ governance initiatives. Based on the maturity assessment results, you can see that the maturity level of DQ sub-capability is still at a lower level than intended.
To improve the situation with data quality companies should focus on the establishment and making operational all DQ processes in the following order:
- The identification of critical data elements (CDEs) to limit the scope of the DQ initiative to make it feasible.
- The specification of DQ requirements for CDEs.
- The analysis of DQ issues for CDEs and their resolution.
- Implementation of DQ checks and controls according to requirements.
In the next article, the same analysis will be provided for data and information value chain capability.