Metadata management (MM), knowledge graphs (KG), and data lineage (DL) are data management capabilities that have a lot of similarities and some differences. In any case, they intersect each other to a great extent, as demonstrated in Figure 1

Figure 1: MM, KG, and DL intersect each other to a great extent.

Figure 1: MM, KG, and DL intersect each other to a great extent.

. Metadata management forms the basis for knowledge graphs and data lineage.

MM, KG, and DL capabilities have similarities and differences.

Let’s first consider similarities:

  1. All concepts deal with metadata.
  2. The following business drivers are relevant for all three capabilities: data integration, business change, and IT cost reduction.
  3. MM, KG, and DL implementation require automated and manual methods.
  4. MM, KG, and DL can be implemented in various IT environments (on-premises, cloud, hybrid) and using different data architecture types (centralized, decentralized, and hybrid).
  5. MM forms the foundation for both KG and DL initiatives.
  6. All these capabilities are used in data integration-related business cases.
  7. KG is one of the technologies that can be used to document data lineage.

In the end, let’s look at some differences:

  1. If MM and DL focus only on metadata, KG also links data and metadata.
  2. If the business changes lead to MM and KG implementation, regulatory requirements are the leading driver for a DL initiative.
  3. MM and DL repositories can use both relational and graph databases. KG can be realized using graph database technologies.

MM, KG, and DL: how to choose what your company needs?

In this series of articles, we demonstrated that these three capabilities: metadata management, knowledge graphs, and data lineage, have a lot in common. The question is, “How can a company choose which initiative fits better its business needs?”

Let’s look at some high-level steps a company should perform to make the right choice. Figure 13 demonstrates these steps.

Figure 2: High-level steps to choose required capabilities.

Figure 2: High-level steps to choose required capabilities.

Step 1: Identify business drivers

First, it is important to analyze and prioritize business drivers. Earlier, we discussed key drivers that motivate a company to implement these three capabilities. You should first independently on these capabilities, perform a thorough analysis of the company’s business strategy, and choose the most significant business drivers. Then, you can compare your company’s drivers with drivers for these capabilities. This comparison will give you the first hint about the capabilities needed for your company.

Step 2: Define the scope and requirements

Each data management initiative is time- and resource-consuming. Often, a company performs multiple data-related initiatives simultaneously. Therefore, each initiative should have a correctly estimated scope to meet the company’s needs and fit the company’s resources. Business drivers identified in Step 1 allow limiting the initiative to the feasible scope.

Step 3: Establish metadata management for the required scope of metadata

A company should realize that metadata management is the foundation for data lineage and knowledge graph initiatives. So, a company can’t start a data lineage or knowledge graph initiative without having a solid foundation: metadata management. Earlier, we demonstrated that metadata could be of various types and include multiple metadata objects.

Therefore, the company should limit its scope of metadata management to the required minimum.

Step 4: Define the scope and requirements for DL or KG initiative

At this step, the company can finally choose the capability they need: data lineage or knowledge graph. The outputs of Steps 1-3 assist in clarifying the company’s needs and the capability that can meet these needs.

Step 5: Establish a business case

When a choice is made, a company can start an initiative. It can happen that a company will perform metadata management and data lineage or knowledge graph initiatives simultaneously.

We reached the end of our series about similarities and differences between three capabilities: metadata management, data lineage, and knowledge graphs.

For more insights, visit the Data Crossroads Academy site: