Are you interested in data modeling maturity?

In the previous articles of this series, we have discussed how to build a company-specific data management maturity assessment and how to benchmark the results for data management framework (data governance) sub-capability.

In this article, I will share an in-depth approach for measuring and benchmarking the maturity level of the data modeling sub-capability. Benchmark results used in this article have been based on the ‘Data Management Maturity Review 2019.’

We will cover the following four topics:

  1. Definition of the ‘data modeling’ sub-capability and its dimensions
  2. Specification of indicators (KPIs) for measuring the performance
  3. Benchmarking results based on a set of indicators
  4. Development tips

 

Data modeling sub-capability and its dimensions

Data modeling is one of the five sub-capabilities of the ‘Orange’ model of data management explained in Data Management Maturity 101: What is a data management maturity assessment, and why does a company need it? The overview of the model is shown in Figure 1.

Figure 1. The structure of data management capability.

Figure 1. Key data management sub-capabilities

 

According to DAMA Dictionary, data modeling is a business capability that delivers data models’ […] a) to define and analyze data requirements, b) design logical and physical structures that support these requirements, and c) define business and technical meta-data (The DAMA Dictionary of Data Management, Second Edition: Technics Publications, 2011, p.81).

The following dimensions enable a (sub) capability:  role, process, data (input and output), and tools. In Figure 2, each dimension of the data modeling sub-capability is described in detail.

Data modeling dimensions

Fig. 2. Data modeling dimensions.

DATA

In our context, ‘data’ stands for formal deliverables/artifacts of the data management sub-capability. The key deliverables of this sub-capability are related to the rules and roles that ensure the operation of the data management function.

It is worth mentioning that data modeling has been recognized as a separate sub-capability by DAMA-DMBOK2. There is a significant misalignment between DAMA publications and TOGAF9.2. All activities and deliverables of data modeling specified in DAMA-DMBOK2 remain within the area of information systems architecture, according to TOGAF9.2.

Data modeling techniques and deliverables apply to any data type, including the master-, reference-, transactional-, and metadata. The deliverables of data modeling focus on four key areas:

  1. Information and data requirements
    A company should start with the analysis and documentation of information requirements. Usually, upcoming regulatory requirements are one of the key drivers for information next to new requirements for management information needed for decision-making. To deliver the required information, corresponding data has to be found, delivered, and transformed. The process of sourcing new data requires time. Data models should be put in place to create a link between information and data. The key deliverables are information and data requirements with a perspective of several years.
  2. Data models
    When discussing data models, we keep in mind that they are built at one of the following levels: conceptual, logical, and physical. Data models serve as the link between information and data requirements. The key deliverable is data models, vertically linked with each other. Business glossaries and data dictionaries complement data models.
  3. Data lineage
    Data models are important components of data lineage. Therefore, data models should be linked with each other along the pathway that data flows from its origin to its usage point. Data lineage can be documented at all three levels of the data model, as my articles about data lineage explain. Therefore, the horizontal data lineage at one or more data model levels should be a part of data modeling deliverables.
  4. Critical data (elements)
    Critical data elements (CDEs) are a common means to scope a data management initiative. You can specify CDEs at different levels of data models, usually logical or physical. Data models are the mandatory input for the specification of CDEs. You learn more about CDEs you can find in my article about CDEs.

The list of CDEs along the data flows one of the key deliverables of data modeling.

 

PROCESS

‘Process’ signifies a data management-related business process at different levels of abstraction.

All business processes related to data modeling focus on developing, documenting, and maintaining key deliverables.

 

ROLE

‘Role’ describes the participation of people in business operations. It can represent business units, functional jobs, a set of data management-related accountabilities and responsibilities (in the RACI context), etc.

Several important groups of roles are involved in the performance of data modeling activities. First, these are data management professionals, i.e., data modelers, analysts, and architects. They possess the skills required to develop data models. Business subject experts are the key knowledge experts that can deliver business definitions and context for data models. For data models at physical levels, IT-related professionals, such as database engineers, architects, and metadata specialists, will be involved.

 

TOOLS

‘Tools’ include information technology systems, applications, and resources required to perform the data management function, e.g., budget.

Several well-known tools are available in the market for data modeling, such as ArchiMate, ER/Studio, and Sparx. While choosing a tool, you should consider integrating these tools with the business process (BPM) and data lineage tools.

 

Specification of indicators (KPIs) to measure the performance

Each sub-capability dimension described above can serve as a specific indicator (KPI) to measure performance.

Each company can create its maturity assessment by assigning maturity levels to chosen indicators.

I will demonstrate a few examples of the four indicators. These indicators have been used as the foundation of our Data Management Maturity Scan:

Indicator 1 (data): ‘business glossary.’
A business glossary is often a starting point for initiating data modeling activities. It serves as a basis for a common business language within a company. Usually, the development of a business glossary is associated with the development of a conceptual data model.

Indicator 2 (data): ‘data models’
Data models are the key deliverables of data modeling. The presence of data models serves as an indicator of a high-level maturity of a data management function.

Indicator 3 (data): ‘documented information and data requirements
As discussed above, information and data requirements are data modeling deliverables that ensure a company can deliver required information from medium- and long-term perspectives.

Indicator 4 (data): ‘specified critical data.’
Specification and maintenance of critical data elements are powerful means to make any data management initiative feasible and fit for purpose.

For each of these indicators, benchmarking information is available.

 

Benchmarking results

Below are the benchmarking results for the four indicators mentioned above (KPIs). You can use these four indicators to benchmark your company’s situation quickly.

Each indicator has been evaluated at one of five maturity levels demonstrating the development level.

Figure 3. Benchmarking results for the maturity of ‘data modeling’ sub-capability

Figure 3. Benchmarking results for the maturity of ‘data modeling’ sub-capability

 

Conclusions

The results presented in Figure 3 lead us to the following conclusions:

  1. More than 50% of respondents have neither a business glossary nor data models. This figure confirms my practical experience with medium-sized companies. Only about 20% of respondents have claimed to be in the process of finalizing a business glossary and/or data models.
  2. The situation with information and data requirements documentation seems to be the same as what has been described in point 1.
  3. As critical data elements are a commonly used technique to scope data management initiatives, about 60% of respondents either have planned or are already in the implementation phase of this concept. It might seem strange that the situation with CDEs looks better than with business dictionaries and data models. Usually, the CDEs definition requires the knowledge of data models.

 

Development tips

To improve the situation with the data modeling, companies should:

…start investing time and resources in the development of business glossaries, data dictionaries, and data modeling

… analyze data models that will allow minimizing data duplications and consequently maintained IT applications

…put more effort into the development of data lineage. Data models are key components of data lineage. Data lineage is a mandatory prerequisite for resolving data quality issues and compliance with numerous legislative requirements.

 

The following article will provide the same analysis for Information Systems Architecture capability.

For more insights, visit the Data Crossroads Academy site: