The previous article reviewed specifics in selecting data governance solutions and their functionalities. In this article, we will discuss the following:

  • The definition and content of metadata management and related challenges
  • Business needs and requirements for a metadata management tool
  • Situation with commercial-off-the-shelf (COTS) data governance tools (based on the analysis of 40 tools)

Metadata Management: Definition, Content, and Associated Challenges

Metadata

Let’s start with the definition of metadata. In my practice, I use the following definition: “Metadata is data that defines and describes other data in a particular context.”

Metadata can be of various types. Multiple metadata classifications exist. DAMA-DMBOK2 recognizes three: business, technical, and operational, as shown in Figure 1.

Figure 1: Metadata types.

Figure 1: Metadata types.

The definitions of these metadata types are presented in Figure 1.

Several challenges are associated with metadata:

  1. Metadata can be data in a particular context.

Let’s take data instances in a database column. These data instances don’t have meaning unless we provide a technical name to this column. The technical name is metadata. We may also need to explain the meaning of the technical names. It will require metadata at an upper level. In other words, we need metadata for metadata. In this case, the technical names turn out to be data in a particular context.

  1. Metadata can be a single data element or a complex object.

An application owner is a single data element. A data model is a complex metadata object that consists of multiple elements.

  1. Various metadata types represent the same object.

Data lineage in the viewpoint of DAMA-DMBOK2 combines business and technical metadata.

Metadata model

Figure 2 demonstrates an example of a metadata model I developed for data lineage. A metadata model shows the metadata objects and the relationships between them. Data lineage is a complex metadata construct that combines business and technical metadata.

Figure 2: A metamodel of data lineage.

Figure 2: A metamodel of data lineage.

I described this metamodel in my book, “Data Lineage from a Business Perspective.”

The key message is the following: various metadata objects and elements must be taken into the scope of metadata management. They are not limited only to the physical level. These objects and elements can be found in multiple IT applications.

Metadata management

Let’s start with the definition of metadata management provided by DAMA-DMBOK2: metadata management is “Planning, implementation, and control activities to enable access to high quality, integrated metadata.”

I apply another definition in my practice: “Metadata management is a company’s ability to discover, gather, and integrate metadata of required quality to enable a data lifecycle.”

Several challenges are associated with metadata management.

  1. Many companies don’t pay enough attention to establishing metadata management.

Metadata management is an enabler of a data lifecycle, including data integration. Metadata management focuses on gathering, integrating, and distributing metadata. In other words, metadata management enables data and metadata lifecycles. That is why proper data management is only possible with metadata management.

  1. Many professionals limit metadata management to technical metadata.

Metadata is a product of various capabilities and corresponding IT tools like business process management, data modeling, data-, application-, technology architecture, data quality, and multiple IT infrastructure-related capabilities. Different types of metadata must be gathered and integrated. Knowledge graphs are one of the examples of this requirement.

  1. Metadata volumes can exceed data volumes.

Data consumption and production volumes grow exponentially. It causes an increase in metadata volumes. In my opinion, the metadata volumes exceed data volumes.

Metadata management requires the same capabilities as data management, as shown in Figure 3. The simple reason for that is that metadata is also data. Figure 3 presents the capability model of metadata management.

Figure 3: A capability model of metadata management.

Figure 3: A capability model of metadata management.

These capabilities include metadata governance, quality, modeling, and architecture. You can find more information in the online course: “Designing metadata management and data lineage capabilities.”

The above-discussed challenges with metadata and metadata management strongly influence requirements for metadata management tools.

Business needs and requirements for metadata management tools

Often, professionals ask me the same question: “We decided to implement metadata management or data lineage. What kind of tool do we need?” My answer starts with the counter-question: “What are your requirements?”

The role and use cases of metadata management are the best examples of a company’s needs in metadata management:

  • Enabling data integration

Companies tend to integrate data from multiple internal and external sources to get more insight from data and support business decisions. Metadata describes data and enables its integration.

  • Improving efficiency by identifying duplicated and redundant data

Large companies have hundreds and thousands of various applications. Data is often duplicated. Much data is not being used for a long time. Operational metadata can assist in identifying these issues.

  • Establishing traceability and transparency of data processing, transformation, and integration because of regulation requirements

To comply with various regulations, organizations must document data lineage which combines business and technical metadata.

  • Reducing IT and DevOps costs

Properly organized metadata reduces time and effort in developing new applications and optimizing data & application landscapes.

Modern metadata management introduces several concepts or capabilities that assist in meeting the business needs discussed above.

These capabilities include data lineage, knowledge graphs, observability, and active metadata.

In this article, I will only provide the definitions of these capabilities—more information about these concepts at the Data Crossroads site.

Data lineage is a description of data movements and transformations at various abstraction levels along data chains and relationships between data at these levels.

A knowledge graph is “interlinked sets of facts that describe real-world entities, events, or things and their interrelations in a human- and machine-understandable format.”

Data observability is a company’s ability to inspect, monitor, and understand the state, quality, and lineage of data within a system.

Data lineage is part of data observability.

Active metadata is a concept that describes a dynamic and automated approach to metadata management.

Instead of static documentation that only describes the data, active metadata is continuously updated, often in real-time, and can automatically trigger actions based on data and metadata changes.

So, when a company thinks about implementing metadata management, it must scope metadata, develop a metadata model, and identify capabilities that a metadata management IT tool must have.

Overview of COTS metadata management IT tools

The results presented in this article are based on an overview of 40 so-called “metadata management” IT tools.

In this article, I will discuss several challenges IT and data management professionals must know while selecting a metadata tool.

Challenge 1: So-called “metadata management” tools have some other labels that characterize these tools

Figure 4 demonstrates the labels these 40 metadata management tools have. The data is based on the reviews of these tools by some authoritative IT solution experts, including Gartner.

Figure 4: Other labels that metadata management tools have.

Figure 4: Other labels that metadata management tools have.

These results demonstrate a simple fact: various metadata management tools have other multiple functionalities.

A company may face a challenge. They may not need the rest of the functionalities if they look only for a tool to manage metadata.

Challenge 2: Vendors of so-called “metadata management” tools label their solutions differently.

Figure 5 demonstrates the various labels vendors use to present their tools at the market.

Figure 5: Vendor labels of metadata management IT tools.

Figure 5: Vendor labels of metadata management IT tools.

We can see that vendors label their tools differently than third-party authorities.

There are some other challenges associated with own labeling.

Let’s take a “platform” label. It is hard to say whether the understanding of the “platform” term is aligned between different vendors.

The types of platforms differ as well. “Data platform,” Data intelligence platform,” and “Data Ecosystem Evolution Platform” are only three examples of 19 platforms I came across. So, it is evident that comparing IT tools based on their labels is practically impossible.

Challenge 3. So-called metadata management tools provide quite different functionalities related to metadata.

This challenge is quite understandable. Earlier, we discussed that the term “metadata” covers the extended variety of metadata objects and can be gathered from different IT tools.

Figure 6 demonstrates the variety of various functionalities.

Figure 6 demonstrates the variety of various functionalities.

Figure 6: The metadata-related functionalities of metadata management IT tools.

We can see that the most common functionalities are data catalog, technical data lineage, scanners, and data observability.

I want additionally to comment on the “metadata management” functionality. I could not find a clear description of this functionality. So, we may only guess its meaning.

Challenge 4. So-called metadata management tools provide some other related functionalities.

Figure 7 demonstrates examples of these functionalities.

Figure 7: Additional functionalities of metadata management tools.

Figure 7: Additional functionalities of metadata management tools.

The most common additional functionalities are security, data governance, and quality. As mentioned earlier, highly probable that these functionalities have pretty different content by various vendors.

Challenge 5. Metadata management IT tools provide data lifecycle-related functionalities.

Figure 8 illustrates the long list of functionalities I will label “data lifecycle”-related ones.

Figure 8: Data-lifecycle related functionalities of metadata management tools.

Figure 8: Data-lifecycle related functionalities of metadata management tools.

Data discovery, ingestion, integration, analysis, and AI/ML functionalities are the most common. This analysis demonstrates the fact that metadata management tools deal with more than metadata. By using metadata, they enable a data lifecycle.

Metadata management tools have been implemented in various industries. Figure 9 demonstrates some statistics.

Figure 9: Implementation statistics per industry.

Figure 9: Implementation statistics per industry.

I could not find industry-related information for almost 50% of reviewed solutions. Financial services, retail, healthcare, telecom, and government are the leading industries implementing metadata solutions.

Conclusions and recommendations

After reviewing metadata management IT tools, I can make several conclusions regarding so-called “metadata management” IT tools:

  1. These IT tools provide functionalities far beyond the functionalities associated with the metadata management definition.

The common definition stresses that metadata management focuses on discovering, gathering, and integrating metadata. In other words, metadata management focuses on enabling the metadata lifecycle.

Most of the so-called metadata IT tools use metadata capabilities to support various data management capabilities they offer.

  1. So-called metadata, data management, data governance, and data lineage tools provide a lot of intersecting functionalities. Providing different “labels” to the same IT tools only confuses the situation and obstructs the process of selecting IT tools.

For companies that are looking for a metadata-management tool, I can recommend undertaking the following steps:

  1. Define the goal of usage and the scope of metadata to be documented

As we discussed, metadata can be of various types. They serve different purposes. For example, a company needs business and technical metadata to ensure data transparency by implementing data lineage. For the purpose of data observability, all three metadata types are required.

  1. Create a metamodel of metadata

This metamodel must describe the required metadata objects and elements to be gathered and integrated. The relationships between these metadata objects must also be a part of this model.

  1. Define the source of metadata within the company.

Depending on the metamodel requirements, a company can identify various metadata sources like data modeling tools and lifecycle management tools.

  1. Detail requirements regarding IT tools focused on enabling a metadata lifecycle.

A company can start the metadata management IT tool selection process only after identifying these requirements.