This article discusses data management in different contexts.
Part 1 of this article explored the challenges associated with defining the concepts of data and information. Part 2 delved into the various approaches to defining management and governance in a general linguistic context and the specific context of data management frameworks.
Part 3 will focus on analyzing the applicability of core data management capabilities across different levels of data abstraction and various stages of the data lifecycle. This analysis aims to clarify how these capabilities are utilized at each level and stage to support effective data management and governance practices.
The approach described below is derived from the O.R.A.N.G.E. Data Management Framework, which I utilize in my practice. Unlike DAMA-DMBOK2 and DCAM®, this framework explicitly leverages the relationship between data management capabilities and the data lifecycle model. Moreover, DAMA-DMBOK2 and DCAM® present significantly different perspectives on data management capabilities. To address these gaps, the analysis below details the artifacts these capabilities should deliver, providing a more transparent framework for their application in practice.
Data Management Capabilities at Various Data Abstraction Levels
Part 1 demonstrated four data abstraction levels: conceptual, logical, physical, and raw data signals. First, let’s discuss the data format and type per each level.
Table 1 illustrates the relationships between data abstraction levels, data representation, and data classification across four levels: IT infrastructure, physical, logical, and conceptual.
IT infrastructure level: Data is represented as binary signals or raw electronic states (e.g., 0s and 1s). It is classified as unstructured or raw, suitable only for processing by physical hardware such as processors and memory devices.
Physical level: Data representation includes files, records, or database tables, focusing on storage formats that depend on a database type. Data at this level is presented as structured data facts stored in databases or ETL (Extract, Transform, Load)/ELT formats and is described by technical and operational metadata. This metadata provides context for how the system stores, accesses, and processes data.
Logical level: At the logical level, data is described by both technical and business metadata in the form of logical, application-agnostic, or application-specific data models. These models define attributes and relationships. This level bridges the gap between the business-oriented conceptual level and the physically stored data, ensuring that data structures align with application requirements and business needs.
Conceptual level: At the conceptual level, data is described using business metadata and is represented as concepts, data subject areas, entities, and their relationships, with a focus on their meaning within a business context. It is classified as high-level and abstract, aligning closely with business capabilities. This level clearly connects data and its purpose, ensuring that data supports strategic objectives and organizational goals.
This structure emphasizes how data evolves in representation and classification as it moves from raw electronic signals to meaningful business-oriented concepts.
Table 2 demonstrates the artifacts of core data management capabilities per data abstraction level.
Let me briefly characterize each of the business capabilities. In Part 2, we already discussed the different roles of capabilities in delivering values: core, strategic, and supporting.
Core Capabilities
Data lifecycle management is a business capability for orchestrating the entire data journey—from creation and processing to storage, usage, archival, and deletion.
Data lifecycle management delivers artifacts at the IT infrastructure level, such as scheduling configurations; at the physical level, ETL workflows and processing pipelines; and at the logical level, data flow diagrams, and transformation rules. These artifacts ensure efficient data orchestration and management across its lifecycle.
Strategic Capabilities
Business architecture: a business capability that represents “[…] holistic, multi-dimensional business views of capabilities, end-to-end value delivery, information, and organizational structure; and the relationships among these business views and strategies, products, policies, processes, initiatives, and stakeholders.”
Business architecture delivers artifacts only at the conceptual level, including business capability maps, data subject area models, and business glossaries. These artifacts align high-level business objectives with organizational data management strategies.
Covernance is a business capability for establishing a holistic framework for data management, including an operating model, organizational structure, and roles and accountabilities; governing each lower-level data management capability by developing regulations and processes; and coordinating the activities of all lower-level data management capabilities.
The governance capability is applicable at each abstraction level.
Supporting Capabilities
Data architecture: a business capability that defines and describes data types and structures.
Data architecture delivers artifacts at the physical level, such as database schemas and ETL process diagrams; at the logical level, including logical data models and data flow diagrams; and at the conceptual level, with data subject area models and conceptual data models. These artifacts ensure the effective structuring and organization of data at all relevant levels.
Application architecture: is a business capability that defines and describes the types and structures of applications and their interactions.
Application architecture provides artifacts at the physical level, such as application deployment diagrams, and at the logical level – logical application models and system interaction diagrams. At the conceptual level, it delivers application capability maps and conceptual application models to align application functionalities with business needs.
Technology architecture: is a business capability that provides “a description of the structure and interaction of the technology services and technology components.”
Technology architecture delivers artifacts at the IT infrastructure level, including hardware configuration diagrams and network topology maps; at the physical level – as technology stack documentation; and at the conceptual level – technology capability roadmaps and reference architectures. These artifacts guide the integration of technology with organizational goals.
Data & IT security: is a business capability that protects data and IT systems by implementing policies, processes, and technologies that ensure confidentiality, integrity, and availability while mitigating risks and complying with regulations.
Data and IT security deliver artifacts at three levels: the IT infrastructure level—access control configurations and encryption standards; the physical level—data backup plans; and the logical level—access control models and audit logs.
Data quality: is a business capability to deliver data and information of the required quality.
Data quality provides artifacts at the physical level, such as data validation rules and data cleansing scripts; at the logical level – data quality metrics and business rules; and at the conceptual level – data quality dimensions. These artifacts help maintain accuracy and consistency across the data lifecycle.
Data analytics: is a business capability to extract meaningful insights from data by leveraging statistical, computational, and visualization techniques to support informed decision-making and strategic planning.
Data analytics delivers artifacts at the physical level, such as analytical data stores and pre-processed datasets; at the logical level – analytical models and data visualization specifications; and at the conceptual level – analytics strategy frameworks and use case definitions. These artifacts support insight generation and decision-making.
Metadata management: is a business capability for discovering, gathering, and integrating required metadata to enable the data lifecycle and manage the metadata lifecycle. It is essential to realize that multiple data management capabilities, such as different architecture types, deal only with metadata.
Metadata management delivers artifacts at the IT infrastructure level, such as system metadata logs; at the physical level, database metadata and ETL metadata; and at the logical level – logical data model metadata and business metadata definitions.
Figure 1 summarizes all of the abovementioned topics in a simplified format.
This picture leads us to several important conclusions:
- Almost all lower-level data management capabilities are required across multiple abstraction levels.
- Many of these capabilities, even when addressing metadata or information, still bear titles that include the word “data” (e.g., data quality, data architecture, data security).
Data Management Capabilities at Various Data Lifecycle Stages
Figure 2 demonstrates the applicability of data management capabilities at various data lifecycle stages. It utilizes the relationships between the concepts of data, metadata, and information with the data lifecycle we discussed in Part 1.
The data lifecycle involves interconnected steps to ensure data and information meet business requirements. It begins with gathering and translating information needs into data requirements, then modeling and describing data and information at different abstraction levels. Organizations then determine whether the required data exists or must be created or acquired.
Data is mapped, validated, and prepared throughout the data chain to enable movement between repositories. Data undergoes processing through operations like transformation, integration, and aggregation. Once processed, data or information is validated, shared, or distributed. Data continues along the chain for further processing while information, being ready for use, is shared with end users.
End users analyze, visualize, and utilize the information, potentially generating new requirements that restart the cycle. Finally, data and information are stored, archived, or deleted in alignment with retention policies. This iterative process ensures that data and information remain accurate, accessible, and aligned with evolving business needs.
This picture leads us to other important conclusions:
- It is impossible to separate data from information, as they are intricately linked and often overlap throughout the data lifecycle process.
- Data management capabilities enable multiple stages of the lifecycle.
- Even though data is transformed into information at certain stages of the lifecycle, the term “data” remains central to the concept of the “data lifecycle,” reflecting its overarching role in all stages of processing, management, and use.
So, the general conclusion after analyzing data management capabilities in different contexts is the following:
We must recognize that “data” is an overarching concept because it is impossible to clearly separate metadata and information from data.
Governance Capability
I want to conclude this series of articles by summarizing my viewpoint on the role of the governance capability in data management.
Governance is one of the lower-level data management capabilities. As demonstrated in Figure 3, this capability has three key tasks.
Task 1: Establish a framework for the holistic data management (DM) capability
This means that the governance capability must assist in specifying the required set of lower-level data management capabilities. In some of my books and publications, I demonstrated that several core data management capabilities are mandatory for most data-related initiatives. We discussed these capabilities in this article: enterprise architecture, data lifecycle management, data quality, security, analytics, and metadata management.
The governance capability must also assist in establishing a DM operating model, organizational structure, and roles and accountabilities.
Task 2: Establish a governance component in each lower-level data management (DM) capability
Governance must define and implement key components for each lower-level capability, including required artifacts, inputs, policies, standards, processes, roles, IT tools, and other assets. For instance, in data architecture, governance defines policies and processes for developing data models, assigns responsibilities to roles like data modelers and stewards, and ensures alignment with organizational objectives.
Task 3: Coordinate activities of lower-level DM capabilities
Governance must align and coordinate interdependent activities of lower-level capabilities, ensuring their integration. For example, data models from data architecture serve as critical inputs for data quality processes. Governance ensures that these dependencies are managed to achieve consistency and efficiency across the data management ecosystem.
Takeways
Any organization that plans to establish a new or adjust an existing data management framework should:
Adopt a Context-Aware Approach to Data Management: Recognize that data exists at various abstraction levels (IT infrastructure, physical, logical, and conceptual), and tailor your data management strategies to align with the specific requirements and deliverables at each level.
Leverage Core Data Management Capabilities Across Lifecycle Stages: Ensure that capabilities such as data lifecycle management, data quality, metadata management, and analytics are integrated throughout all stages of the data lifecycle—from gathering requirements to archiving or deleting data.
Acknowledge the Interconnectedness of Data, Metadata, and Information: Align internal definitions of data, metadata, and information, as these concepts overlap across abstraction levels and lifecycle stages. This holistic understanding enables more effective governance and management practices.