The previous article discussed the key trends and challenges of choosing appropriate IT tools for various data management purposes. In this article, I will demonstrate the following:
- An Approach to Choosing Data Management IT tools
- The definitions of core data management concepts
- The structure of analysis of 5 groups of data management-related software
A high-level approach to choosing an IT tool
Commonly, the process of choosing any IT tool for data-related includes similar steps. In this article, I will consider the 8-step method, shown in Figure 1.
Let’s briefly consider each of these steps.
Step 1: Scope the initiative
The correctly defined scope is one of the critical success factors of any data-related initiative. A company should take into consideration multiple parameters like the following:
- Business drivers
Business drivers are reasons for a company to start a data initiative. They link a business strategy with a data initiative. Multiple internal and external factors like digital transformation, compliance with regulations, and improving decision-making encourage a company to focus on improving data management. “Compliance with personal data protection regulations” is an example of a business driver.
- Stakeholders
Business drivers have various internal and external stakeholders. Often, the needs of multiple stakeholders, even related to one business driver, may differ significantly. So, while scoping a data initiative, the needs of all stakeholders must be considered.
For the above-stated driver, we can identify several internal business functions involved in the initiative, like IT, data management, HR, marketing and sales, customer support, commercial business units, etc. This driver will also have external stakeholders like supervisory institutions.
- “Enterprise” scope
The term “enterprise” describes the number of companies, business lines, and business units to be taken into the initiative’s scope. In the case above, all business units that deal with personal data must participate in the initiative.
- Data and data chains
The defined business drivers and stakeholders’ needs define the scope of data. We can classify data differently. In major cases, we talk about digital data. Next to data, we can also take corresponding metadata into the scope. To make the initiative feasible, we may limit data and data chains involved in its lifecycle to several critical ones.
- Data, application, and technology architecture
While choosing an IT tool, it is essential to consider the baseline (current) architecture landscape and the target architecture. Furthermore, different data architecture types (centralized, distributed, etc.) may require various IT tools.
Step 2: Identify needs and requirements
Step 1 helps identify the needs of various stakeholders. Step 2 focuses on translating these needs into requirements. For example, multiple stakeholders may need to make data processing and transformation transparent. However, the requirements of businesspeople and technical professionals will differ significantly. A correctly chosen IT tool must deliver functionality that meets the requirements of all key stakeholders.
Step 3: Evaluate available tools
This task is one of the most challenging ones. First, a company must evaluate its existing systems and processes to determine if they can be improved or if a new solution is necessary. Then, it should identify any gaps or inefficiencies in its current data management practices.
It is hardly possible that a new IT tool will meet all requirements. So, some trade-offs regarding the mandatory functionality are required. The previous article discussed some common challenges in choosing an IT tool. In the consequent articles, I will review several IT tools for data and metadata management, data governance, data lineage, and knowledge graphs in-depth.
Step 4: Consider integration and interoperability
The chosen tool must be integrated with other systems and technologies already in use within the organization. This includes evaluating how the solution will work with existing data sources, data storage systems, and analytics tools. Matching the chosen tool with future state data, applications, and technology architectures is also essential.
Step 5: Assess scalability and flexibility
This step includes evaluating how the solution will handle growing data volumes and expanding data sources. A company must also assess how easy it is to modify and customize the solution to meet changing business needs.
Step 6: Evaluate the ease of use
The next step focuses on analyzing the user-friendliness of the tool for technical and non-technical users. This includes evaluating the functionality of the user interfaces, training and support, and the availability of documentation and resources.
Step 7: Consider required investments and ROI
A company should consider the cost of the IT solution, including implementation, licensing, training, ongoing maintenance, and support. A company can assess the ROI of the solution, including the potential benefits of increased efficiency and effectiveness of data-related processes, process automation, etc.
Step 8: Select the best fit
Based on your research and evaluations, a company should select the IT tool that best meets its needs and requirements regarding necessary and available functionality. The chosen solution must align with a company’s overall objectives and budget constraints.
The Definition of Core Data Management Concepts
In the previous article, I discussed several challenges of choosing an IT tool for data management. One of these challenges is unaligned definitions of concepts. In this paragraph, I will demonstrate the approach used in this series. This series aims to analyze five IT tool types: data management, data governance, metadata management, data lineage, and knowledge graphs. So, we need to define the definitions of the related terms.
Data Management
Data management has multiple definitions depending on the content. In this article, I will demonstrate two approaches the “Orange” data management framework (DMF) uses.
Figure 2 illustrates the first approach: the definition and content of data management depend on the organizational level.
At the strategic level, data management aims to enhance data value. For that, a company should initiate programs and develop a strategy. At the tactical level, a company seeks to control data. Policies, processes, and medium-term plans will assist in achieving this goal.
The company will focus on ensuring a data lifecycle at the operational level. It can be achieved by implementing required IT tools and performing procedures.
This approach leads to the conclusion that we need to use the data management definition at the operational level for analyzing IT tools that enable a data lifecycle. In this context, data management at the operational level is a company’s ability to enable a data lifecycle.
Data management is multidisciplinary and requires various capabilities to deliver business value. The second approach of the “Orange” DMF considers this fact and describes data management as a set of lower-level capabilities that play various roles in delivering value. Figure 3 represents the second approach.
This approach uses the Open Group methodology and adapts its model to data management needs. The core value proposition of data management is enabling a data lifecycle. Leading capabilities like data governance and business architecture define data management development’s direction.
It is worth mentioning that I’ve substituted the original Open Group title “strategic” with “leading.” There are two reasons for that:
- The word “leading” better reflects the goal of capabilities at this level.
- Using the same word, “strategic,” confuses understanding the different contexts of two data management models.
The rest of the capabilities support the core capability: data lifecycle management.
The two models discussed above should work together. Figure 4 demonstrates the dependencies between these models expressed in a matrix form.
This matrix demonstrates a simple conclusion: all data management capabilities must be implemented at three organizational levels, independently of their role in delivering business value.
Data Governance
The definition of data governance is not aligned within the data management community. If you are interested in this topic, I will direct you to one of the webinars published on the Data Crossroads Academy site.
In the context of these series, I define data governance as a company’s ability to:
- Design a data management operating structure
- Specify related rules, roles, and processes
- Coordinate all other data management capabilities
- Measure data management maturity and performance
Metadata Management
For this series, metadata management is a company’s ability to discover, gather, and integrate metadata of required quality to enable a data lifecycle.
Data lineage and knowledge graphs are sub-capabilities of metadata management; however, I will consider the software for these capabilities separately.
Data lineage
Data lineage is a description of data movements and transformations at various abstraction levels along data chains and relationships between data at these levels.
Knowledge graph
The knowledge graph is a description of the relationship between real-world facts and metadata that describes these facts in a human- and machine-readable format.
The structure of analysis of 5 groups of data management-related software
In the consequent articles, I will review existing IT solutions for data management, data governance, metadata management, data lineage, and knowledge graph.
I will perform this review using the following structure:
- Business needs
A business need is a “problem or opportunity to be addressed.”
- Requirements
A “requirement” is a usable representation of a need.”
- Analysis of existing COTS solutions regarding their functionalities and other parameters.
Summary
- Commonly, the process of choosing any IT tool for data-related includes similar steps. In this article, I considered the 8-step method.
- One of these challenges in choosing an IT tool must be aligned definitions of concepts. For the purposes of this series, we specify definitions of 5 capabilities: data management, data governance, metadata management, data lineage, and knowledge graphs.