Are you interested in data modeling maturity?
In the previous articles of this series, we have discussed how to build a company-specific data management maturity assessment and the way to benchmark the results for data management framework (data governance) sub-capability.
In this article, I will share an in-depth approach for measuring and benchmarking the maturity level of the data modeling sub-capability. Benchmark results used in this article have been based on ‘Data Management Maturity Review 2019.’
We will cover the following four topics:
- Definition of the ‘data modeling’ sub-capability and its dimensions
- Specification of indicators (KPIs) for measuring the performance
- Benchmarking results based on a set of indicators
- Development tips
Data modeling sub-capability and its dimensions
Data modeling is one of the five sub-capabilities of the ‘Orange’ model of data management that is explained in Data Management Maturity 101: What is a data management maturity assessment and why does a company need it? The overview of the model is shown in Figure 1.
According to DAMA Dictionary, data modeling is a business capability that delivers data models ‘[…] a) to define and analyze data requirements, b) design logical and physical structures that support these requirements, and c) define business and technical meta-data (The DAMA Dictionary of Data Management, Second Edition: Technics Publications, 2011, p.81).
The following dimensions enable a (sub) capability: role, process, data (input and output), and tools. In Figure 2, each dimension of the data modeling sub-capability is described in detail.
Fig. 2. Data modeling dimensions.
In our context, ‘data’ stands for formal deliverables/artifacts of the data management sub-capability. The key deliverables of this sub-capability are related to the rules and roles that ensure the operation of the data management function
It is worth mentioning that data modeling has been recognized as a separate sub-capability by DAMA-DMBOK2. There is a big misalignment between DAMA publications and TOGAF9.2. All activities and deliverables of data modeling specified in DAMA-DMBOK2 remain within the area of information systems architecture according to TOGAF9.2.
Data modeling techniques and deliverables apply to any type of data, including master-, reference-, transactional-, and metadata. The deliverables of data modeling focus on four key areas:
- Information and data requirements
A company should start with the analysis and documentation of information requirements. Usually, upcoming regulatory requirements are one of the key drivers for information next to new requirements for management information needed for decision-making. To deliver the required information, corresponding data has to be found, delivered, and transformed. The process of sourcing new data requires time. To create a link between information and data, data models should be put in place. The key deliverables are information and data requirements with a perspective for several years.
- Data models
When we speak about data models, we keep in mind that they are built at one of the following levels: conceptual, logical, and physical. Data models serve as the link between information and data requirements. The key deliverable is data models, vertically linked with each other. Business glossaries and data dictionaries complement data models.
- Data lineage
Data models are important components of data lineage. Therefore, data models should be linked with each other along the pathway that data flows from its origin to its usage point. Data lineage can be documented at all three levels of the data model as explained in my articles about data lineage. Therefore, the horizontal data lineage at one or more data model levels should be a part of data modeling deliverables.
- Critical data (elements)
Critical data elements (CDEs) are a common means to scope a data management initiative. You can specify CDEs at different levels of data models, usually logical or physical. Data models are the mandatory input for the specification of CDEs. More about CDEs you can find in my article about CDEs.
The list of CDEs along the data flows is one of the key deliverables of data modeling.
‘Process’ signifies a data management-related business process at different levels of abstraction.
All business processes related to data modeling focus on the development, documentation, and maintenance of key deliverables.
‘Role’ describes the participation of people in business operations. It can represent business units, functional jobs, a set of data management-related accountabilities and responsibilities (in RACI context), etc.
There are several important groups of roles involved in the performance of data modeling activities. First, these are data management professionals, i.e. data modelers, data analysts, and data architects. They possess the skills required to develop data models. Business subject experts are the key knowledge experts that can deliver business definitions and context for data models. For data models at physical levels IT-related professionals, such as database engineers and architects as well as metadata specialists will be involved.
‘Tools’ include information technology systems and applications as well as resources required for performing the data management function, e.g. budget.
There are several well-known tools available at the market for data modeling, such as ArchiMate, ER/Studio, Sparx. While choosing a tool, you should think about the necessity to integrate these tools with the business process (BPM) and data lineage tools.
Specification of indicators (KPIs) to measure the performance
Each sub-capability dimension described above can serve as a specific indicator (KPI) to measure performance.
By assigning maturity levels to chosen indicators each company can create its maturity assessment.
I will demonstrate a few examples for the four indicators. These indicators have been used as the foundation of our Data Management Maturity Scan:
Indicator 1 (data) : ‘business glossary’
A business glossary is often a starting point for the initiation of data modeling activities. It serves as a basis for a common business language within a company. Usually, the development of a business glossary is associated with the development of a conceptual data model.
Indicator 2 (data): ‘data models’
Data models are the key deliverables of data modeling. The presence of data models serves as an indicator of a high-level maturity of a data management function.
Indicator 3 (data): ‘documented information and data requirements
As discussed above, information and data requirements are the deliverables of data modeling that serve to ensure a company’s ability to deliver required information in medium- and long-term perspectives.
Indicator 4 (data): ‘specified critical data’
Specification and maintenance of critical data elements are a powerful means to make any data management initiative feasible and fit for purpose.
For each of these indicators, benchmarking information is available.
Below you will find the benchmarking results for the four above-mentioned indicators (KPIs). You can use these four indicators to quickly benchmark the situation in your company against.
Each of the indicators has been evaluated at one of five maturity levels, that demonstrate the level of development.
The results presented in Figure 3, lead us to the following conclusions:
- More than 50% of respondents have neither a business glossary nor data models in place. This figure confirms my practical experience with medium-sized companies. Only about 20% of respondents have claimed of being in the process of finalizing business glossary and/or data models.
- The situation with documentation of information and data requirements seems to be the same as what has been described in point 1.
- As critical data elements are a commonly used technique to scope data management initiatives, about 60% of respondents either have planned or already in the implementation phase of this concept. It might seem strange that the situation with CDEs looks better than with business dictionaries and data models. Usually, the CDEs definition requires the knowledge of data models.
To improve the situation with the data modeling companies should:
…start investing time and resources in the development of business glossaries, data dictionaries, and data modeling
… analyze data models that will allow minimizing data duplications and consequently maintained IT applications
…put more effort in the development of data lineage. Data models are key components of data lineage. Data lineage is a mandatory prerequisite for the resolution of data quality issues and compliance with numerous legislative requirements.
In the next article, the same analysis will be provided for Information Systems Architecture capability.