Article 5. Choosing Data Management IT Tools: Data Governance Solutions
The previous article reviewed specifics in selecting data lineage solutions and their functionalities. In this article, we will discuss the following:
- The definition and content of data governance and related challenges
- Business needs and requirements for a data governance tool
- The situation with commercial-off-the-shelf (COTS) data governance tools (based on the analysis of 28 tools)
Data Governance: Definition, Content, and Associated Challenges
The term “data governance” has multiple definitions in the data management context. I’ve discussed this issue in several articles, “Data Management and Data Governance in a Nutshell,” “Data Management and Governance 101,” and “DAMA-DMBOK2 vs. DCAM 2.2: Mapping between Frameworks.”
In this article, I will only summarize the key challenges that this issue brings to selecting a data governance tool.
Challenge 1: Data governance’s definition, role, and deliverables differ between leading industry guidelines.
DAMA-DMBOK2 and DCAM are two leading industry guidelines/frameworks that have different viewpoints on the content of data governance. Let’s start by comparing their definitions.
DAMA-DMBOK2 says, “Data governance is the exercise of authority, control, and shared-decision making (planning, monitoring, and enforcement) over the management of data.”
DCAM has its interpretation of a data governance function: “The function that defines and implements the standards, controls and best practices of the data management initiative in alignment with strategy.”
Figure 1 demonstrates differences in these definitions.
DAMA-DMBOK2 defines data governance as a knowledge area, while DCAM does it as a function. In my viewpoint, the “data governance knowledge area” provides the theoretical foundation, while the “data governance function” is about the practical application of that knowledge.
The key challenge is that from the DAMA-DMBOK2 viewpoint, data governance only plans, monitors, enforces, and controls what data management does. DCAM delegates to data governance the “implementation” power. However, it isn’t easy to interpret the actual meanings of these definitions.
The more significant challenge is that these frameworks define data governance deliverables differently.
DAMA-DMBOK2 limited data governance deliverables to data strategy, policies, processes, roles, plans, scorecards, etc.
DCAM is a closed society, making it hard to understand its approach and methodology. Some earlier publications available to the public demonstrated deliverables that DCAM assigned to data governance. Some of them, like data domains, models, glossaries, and classifications, could hardly be considered data governance deliverables. They are artifacts of data modeling and architecture. I don’t have access to the current version of DCAM and don’t know whether they changed their viewpoint.
Challenge 2: Leading authorities in data management have quite different understandings of data governance.
This statement is easy to prove. Below are several definitions I got by searching for “data governance definition” in Google.
“Data governance (DG) is the process of managing the availability, usability, integrity and security of the data in enterprise systems, based on internal data standards and policies that also control data usage.”
“Data governance is a collection of processes, roles, policies, standards, and metrics that ensure the effective and efficient use of information in enabling an organization to achieve its goals.”
“Data governance is a set of principles, standards, and practices that ensures your data is reliable and consistent. It also helps ensure that your data can be trusted to drive business initiatives, inform decisions and power digital transformations.”
“Data governance promotes the availability, quality, and security of an organization’s data through different policies and standards.”
“A set of processes that ensures that data assets are formally managed throughout the enterprise. A data governance model establishes authority and management and decision-making parameters related to the data produced or managed by the enterprise.”
Figure 2 summarizes these definitions by answering two questions: WHAT is data governance, and WHY does a company need it?
Even at first sight, you can see that the definitions have similarities and differences. You can find similarities in defining “what” data governance. A process, role, etc., are all components of a data management framework—the differences and challenges you can see in answers to the “why” question. I believe it is a data quality capability accountable for data reliability, quality, trustfulness, and consistency, not data governance. Data governance should only coordinate the development of a data quality framework. A data quality capability is an independent data management capability. The proposed role of data governance leads to the situation when data governance and data quality capabilities are both accountable for data quality which can’t be per definition. What do you think?
Challenge 3: Data Management and Governance are two different concepts, while “management” and “governance” are synonyms from the linguistic viewpoint.
I don’t know the history of introducing and differentiating the concepts of “management” and “governance” in the data management community. Recently, I discovered that several most recognizable linguistic dictionaries, Merriam-Webster and Thesaurus, consider “management” and “governance” synonymous. In daily life, it often leads to the situation that the words “management” and “governance” are used interchangeably. Data-related professionals, who are unfamiliar with the leading industry guidelines, can use these concepts quite freely, giving different meanings and content to these concepts. I will demonstrate the background of this conclusion later in this article by showing the functionalities of “data governance” tools.
All discussed above leads us to a simple conclusion: each company must define its definition of a data governance concept.
Business Needs and Requirements for a Data Governance Tool
The needs and requirements for a data governance tool depend entirely on a company’s internal definition and understanding of data governance.
A few requirements can be expected like the following:
- Maintain data management roles and use them as business metadata for various data management artifacts
For example, different types of owners like data-, business process-, and system owners are examples.
- Record various legislative documents and policies
An example of a use case is the ability to link data requirements and data elements to various legislative documents.
- Maintain a business glossary
Strangely enough, a business glossary is often considered an artifact of data governance. In my practice, I assign it to data modeling as this capability must describe data elements at various abstraction levels.
A company must formulate the requirements for a data governance tool based on the company’s understanding of data governance and its deliverables.
Overview of COTS Data Governance Tools
In this article, I will discuss several challenges IT and data management professionals must know while selecting a data governance tool.
Challenge 1: The functionalities of data governance tools deviate significantly from the definitions of data governance.
The biggest question you should ask while selecting a data governance tool is which functionality you need.
Recently, I came across the “Market Guide for Data and Analytics Governance Platform” by Gartner. I don’t want to hide my surprise when I’ve seen the list of capabilities such a platform must have (according to Gartner):
- Access management
- Active metadata
- Analytics
- Business glossary
- Connectivity/Integration
- Data Catalog
- Data Classification
- Data Dictionary
- Data Lineage
- Impact Analysis
- Information Policy Representation
- Matching, Linking, and Merging
- Orchestration/Automation
- Profiling
The only what I could not find was the classical “data governance” like processes, roles, policies, etc.
As a result, I have two questions:
- “What is the difference between data management, data governance, metadata management, and data lineage tools.”
- Why do we continue classifying tools differently when they have the same functionality?
I considered 28 IT tools recognized as data governance by the most respectful sources: Gartner, Forrester, and Solutions Review. Many of the tools in the list I analyzed in my previous articles were devoted to data management and lineage tools because the same sources labeled them as “data management” and “lineage” ones.
Challenge 2: The vendors of IT tools labeled as “data governance” do not recognize their tools as such.
I checked the sites of these 28 providers to find the titles the vendors used to describe their software. Only two vendors recognized its solution as related to data governance. Figure 3 demonstrates the labels vendors used to describe their tools.
You can see that 25% of vendors describe the functionality of their tools as “data management”-related. Some tools provide solutions specific to a particular data type: marketing, customer, and master.
So, this brief analysis only supports my assumption that people use the term “governance” without clearly defining its meaning.
Challenge 3: The functionality of data governance tools intersects significantly with the functionality of data- and metadata management and data lineage tools.
I split the functionalities of IT tools classified as “data governance” into three categories: data management, metadata management, and data lifecycle management.
Let’s look at the data management-related functionality shown in Figure 4. I used the term “data management” to demonstrate capabilities defined by DAMA-DMBOK2 as constituent components of data management.
So, we can see that tools include classical data management functionalities like data quality, enterprise architecture, master and reference data, etc.
Figure 5 demonstrates the list of functionalities I associate with metadata management:
- Data catalogs
- Data observability
- Data lineage
- Scanners, etc.
For those still ready to be surprised, Figure 6 demonstrates data lifecycle-related functionalities that so-called “data governance” tools offer. These functionalities enable various processes of data lifecycle steps like data discovery, ingestion, preparation, etc.
I hope I provided enough evidence to support my previous conclusion: in the data management community, we don’t have an aligned understanding of the “data governance” concept and misuse this term in our communication.
Summary
A company must perform the following steps in selecting a data governance tool:
- Identify the role of data governance regarding data management.
Data management is a multidisciplinary capability. Data governance is one of the data management capabilities. Usually, data governance must coordinate the activities of the rest of the data management capabilities by defining the framework in which data management operates.
- Identify the definition, content, and deliverables of data governance.
As we discussed, data governance has multiple definitions. The role of data governance in data management will assist in defining the content and deliverables of data governance.
The most common is a data governance operating system that includes a governance structure, governing bodies, processes, and roles.
- Define the format in which data governance will operate.
Data governance can be set up as a formal business unit within a company’s organizational structure. Another option is to assign tasks and accountabilities to existing organizational roles.
- Define requirements for a data governance tool.
These requirements result from the agreed deliverables of data governance.
- Search and select a tool
The most important is to search by required functionality and not by the “label” given by third-parties reviews.