Are you interested in comparing data quality maturity in your company with peers in the industry?
In the previous articles of this series, we have discussed how to build a company-specific data management maturity assessment and how to benchmark the results for Information Systems Architecture.
In this article, I will share an in-depth approach for measuring and benchmarking the maturity level of the data quality sub-capability. Benchmark results used in this article have been based on the ‘Data Management Maturity Review 2019.’
We will cover the following four topics:
- Definition of the ‘data quality’ sub-capability and its dimensions
- Specification of indicators (KPIs) for measuring the performance
- Benchmarking results based on a set of indicators
- Development tips
Data Quality sub-capability and its dimensions
Data Quality is one of the five sub-capabilities of the ‘Orange’ model of data management explained in Data Management Maturity 101: What is a data management maturity assessment, and why does a company need it? The overview of the model is shown in Figure 1.
‘Data quality is a business capability that enables the delivery of data and information of the required quality’ as specified in ‘Data Management Maturity Review 2019’.
The following dimensions enable a (sub) capability: role, process, data (input and output), and tools.
In Figure 2, each dimension of the data quality sub-capability is described in detail.
Figure 2. A detailed description of the data quality dimensions.
In our context, ‘data’ stands for formal deliverables/artifacts of the data management sub-capability. The key deliverables of this sub-capability are related to the rules and roles that ensure the operation of the data management function.
There are three key groups of the deliverables of data quality sub-capability:
- Information and data quality requirements
Without identifying requirements, you can never identify whether some data has data quality issues. Assume that information about customer addresses is available for 80% of the data set. Is it a DQ issue or not? Unless the requirement of 100% completeness of the value exists, data users cannot complain. Therefore, the definition of data and information requirements is of great importance.
- Resolved data quality issues
There are several deliverables in this group. All of them are related to tasks to perform. You start with identifying, conformance, and documenting data quality issues. Therefore, a company needs to implement and share access to the repository of data quality issues. Then the origin of data quality issues has to be analyzed. All artifacts related to such analysis should be documented and archived. To resolve the issues, a list of activities should be undertaken. As a result, you will get the final deliverable: resolved DQ issues.
- DQ checks and controls
A set of checks and controls should be developed to prevent repetitive DQ issues. These checks and controls will be built according to the DQ requirements. These requirements are to be translated into business rules and implemented along the data chain.
‘Process’ signifies a data management-related business process at different levels of abstraction.
The processes required for DQ will support the delivery of all artifacts mentioned above: gathering DQ requirements, resolving DQ issues, and documenting and building DQ checks and controls.
‘Role’ describes the participation of people in business operations. It can represent business units, functional jobs, a set of data management-related accountabilities and responsibilities (in the RACI context), etc.
As in each of the data management sub-capabilities, different roles will be involved in the performance of tasks. Data management professionals like DQ analysts, solution engineers, and IT- and database engineers and developers will be involved in tasks related to DQ. These tasks are data analysis and design and implementation of DQ checks and controls. Subject matter experts (SMEs) from the business department will be accountable for reporting DQ issues and providing input into defining DQ requirements. SMEs will also be involved in the data quality issues analysis should the causes of these issues relate to business operations. Management level of SMEs and/or governance bodies will be involved in resolving escalated interdisciplinary DQ issues.
‘Tools’ include information technology systems, applications, and resources required to perform the data management function, e.g., budget.
For DQ sub-capability, there are two most important tools to be implemented:
- DQ analytical tools
With the huge amount of data circulating in a company, it becomes impossible to analyze data quality manually. Data quality profiling tools have become very important. New solutions based on machine learning techniques are beneficial for DQ analysis.
- Repository for DQ business rules and DQ checks and controls
Large multinational companies either don’t have business rules repositories manually or maintain them in Excel. Excel as the repository tool might work for smaller organizations. It will not work for companies where data transformation crosses many departments. The necessity to align both business rules and DQ checks and controls along the data transformation chain requires a central repository, at least for some part of an enterprise.
Specification of indicators (KPIs) to measure the performance
Each sub-capability dimension described above can serve as a specific indicator (KPI) to measure performance.
Each company can create its maturity assessment by assigning maturity levels to chosen indicators.
I will demonstrate four indicators as examples. These indicators have been used as the foundation of our Data Management Maturity Scan:
Indicator 1 (Tools): Availability of required information for decision-making
One of the key value propositions of data management is the support of decision-making. Decisions are made based on information. Only trustworthy information that is required for decision-making will enable this value proposition. Therefore, keeping control of the actual requirements and measuring end-users satisfaction is highly important.
Indicator 2 (Process): Information/data delivery according to requirements
Data and information should be delivered on time to the right place and to the right person to be helpful. Therefore, control of the delivery of data and information according to requirements should be put in place.
Indicator 3 (Data): Data at the required level of quality
As already discussed, to be able to assess the quality of data, requirements should first be specified. Then the DQ checks and controls will ensure that data is delivered at the required level.
Indicator 4 (Role): Operational DQ checks and controls
DQ checks and controls should be operational, and corresponding business processes have to support the performance of DQ checks and controls and the consequent analysis of results.
Below are the benchmarking results for the four indicators mentioned above (KPIs). You can use these four indicators to benchmark your company’s situation quickly.
Each indicator has been evaluated at one of five maturity levels demonstrating the development level.
The results presented in Figure 3 led us to the following conclusions:
- The situation with the availability of required information for decision-making looks rather sad. Almost half of the companies experience a lack of required information. Only 12% of companies have a high level of maturity for providing the required information. The rest of the companies are in the process of developing solutions.
- The situation with data delivery according to the requirements seems a little better. 26% of companies have reached their goals. Almost 60% of companies cannot deliver data according to the requirements.
- 80% of respondents have not yet reached their goals of delivering data of the required quality, of which almost 50% don’t have formal DQ processes in place.
- 40% of respondents are in the process of developing DQ checks and controls. The results of measurements of indicators 3 and 4 correlate with each other. The relationship is simple: as long as a company has not implemented DQ checks and controls, data quality cannot be on the required level.
Data quality has been the starting point for many companies in their data management/ governance initiatives. Based on the maturity assessment results, you can see that the maturity level of DQ sub-capability is still at a lower level than intended.
To improve the situation with data quality, companies should focus on establishing and making operational all DQ processes in the following order:
- We are identifying critical data elements (CDEs) to limit the scope of the DQ initiative to make it feasible.
- The specification of DQ requirements for CDEs.
- The analysis of DQ issues for CDEs and their resolution.
- Implementation of DQ checks and controls according to requirements.
The following article will provide the same analysis for data and information value chain capability.
For more insights, visit the Data Crossroads Academy site: