This article explores the origins of the debate on data vs. information within the context of data management.

Challenges with the Definitions

There are several challenges with defining the concept of data and information.

Challenge 1: Industry Frameworks and Authorities don’t have aligned definitions of data and information

Let’s check DAMA-DMBOK2, DCAM®2.2, and Gartner for the definitions of data and information.

The DAMA Dictionary offers multiple definitions of data, which can be summarized as follows: Data is an individual fact represented in forms such as text, numbers, graphics, images, sounds, or videos that are out of context and lack inherent meaning.

The DAMA Dictionary also shares several definitions of information, which can be summarized as the interpretation and understanding of data within a particular context.

Strangely enough, neither the EDM Council nor Gartner Glossaries provide definitions of data and information.

Challenge 2: Circular reference between the definitions of data and information

Some definitions treat data as a form of information, while others define information as a form of data. This creates a circular reference that complicates understanding and differentiating between these two concepts. Let me show two examples.

Data is information

According to the ISO/IEC Standard No. 11179-1:2015, data is a “re-interpretable representation of information in a formalized manner suitable for communication, interpretation, or processing.”

Information is data

Another ISO standard, ISO 22263: 2008, stipulates information as “meaningful data.” The same does the TOFAG® Standard, defining information as “Any communication or representation of facts, data, or opinions, in any medium or form, including textual, numerical, graphic, cartographic, narrative, or audio-visual forms.”

This challenge leads to the next one.

Challenge 3: The terms “data” and” information” are used interchangeably

During my workshops, I often ask participants about their understanding of the concepts of “data” and “information.” This question frequently makes them feel uneasy. The reason is that we tend to use these terms interchangeably in daily life and professional settings. The DAMA-DMBOK2 book reflects this practice, stating: “Throughout the DMBOK, the terms [data and information] will be used interchangeably.”

Challenge 4: The definitions of “data” and “information” depend on the context.

We can assume that these definitions vary depending on different contexts. In this article, I will explore two factors related to the context of data management and governance. First, data can be described at multiple levels of abstraction, such as conceptual, logical, and physical data model levels. Second, the context is influenced by the data lifecycle, which shapes how data evolves through different processing stages.

Definitions of Data and Information Used in this Article

I do not claim to know the “correct” definitions. Instead, I share the definitions I use in my practice and this article.

Data is the physical or electronic representation of signals “in a manner suitable for communication, interpretation, or processing by human beings or by automatic means.”

Information is “data in a context that allows for explaining its meaning and specifying its relational connections.”

Metadata, which is “data that defines and describes other data in a particular context,” helps link the concepts of data and information.

Figure 1 demonstrates the relationship between these three concepts.

Figure 1: The relationship between data, metadata, and information concepts.

Figure 1: The relationship between data, metadata, and information concepts.

Data represents raw values (e.g., “50”), while metadata provides context to define and give meaning to the data (e.g., “KM/H”). Together, data and metadata form information (e.g., “50 KM/H”), which is a collection of data contextualized to make it meaningful and useful.

Let’s examine how the relationships between data, metadata, and information can be represented in the two contexts mentioned above as Challenge 3.

Data and Information in the Context of Various Abstraction Levels

As mentioned above, data can be described at different abstraction levels. Figure 2 shows an example of how data, metadata, and information concepts are interpreted depending on the levels.

Figure 2: The dependencies between data, metadata, and information definitions in the context of different abstraction levels.

Figure 2: The dependencies between data, metadata, and information definitions in the context of different abstraction levels.

IT infrastructure level (Raw data signals): provides the foundational layer where data exists as binary signals for processing.

This is the lowest level, where data exists as electronic signals represented by binary states (0s and 1s). Processing occurs through physical hardware components like processors, memory, and storage devices. Data at this level is raw and unstructured, with no inherent meaning until interpreted by higher layers.

Physical data model level: defines how the data is stored and processed.

At this level, data is structured in a way that corresponds directly to how it is stored, processed, and accessed in physical systems (e.g., databases or file systems). It includes technical metadata like table structures, storage formats, indexes, and database schemas. This level focuses on implementing and optimizing data storage, retrieval, and transformation.

Logical data model level: defines how the data is organized.

The logical model describes data in terms of relationships and structures that are independent of physical storage. It includes elements such as entities, attributes, and relationships, focusing on how data is organized and accessed without concern for specific hardware or storage technologies. This is the level at which data is structured to meet application requirements.

Conceptual data model level: defines what data means.

The conceptual model represents data from a high-level, business-oriented perspective. It abstracts away technical details and focuses on defining the meaning and relationships of data in terms that are understandable to business stakeholders. This level is used for communication between business and technical teams to align data requirements and meanings, and it requires business metadata to describe data. Data can be represented as data entities.

This interpretation leads to a challenging conclusion: we manage and govern data strictly at the lowest IT infrastructure level, where it exists as raw binary signals or unstructured data. Once metadata is added to provide context, data transforms into information.

Consequently, at all upper levels—physical, logical, and conceptual—we are no longer solely managing and governing raw data but instead managing and governing information that includes data and its metadata.

Data and Information in the Context of the Data Lifecycle

A data lifecycle is a model of the set of processes that move and transform data from the moment of its creation to the moment of its archiving and/or destruction.

There is no standard approach to describing the data lifecycle model and its processes. Figure 3 demonstrates the data lifecycle model I use in the O.R.A.N.G.E. Data Management Framework (DMF).

Figure 3: The dependencies between data, metadata, and information definitions in the context of a data lifecycle.

Figure 3: The dependencies between data, metadata, and information definitions in the context of a data lifecycle.

In the “data lifecycle” context, data is often seen as an input in a system or an application. After data is processed, information is obtained as an output. In the case of a chain of applications, the situation looks complicated.

This model outlines eight steps or processes that data undergoes during its lifecycle. Unlike other models, it begins with gathering and defining information and data requirements using data models. This initial step helps determine whether new data needs to be created or acquired from external sources. Once this requirement is established, the data movement and transformation process begins along a data chain or pipeline.

Data must first be mapped, validated, and prepared for processing before it is loaded into IT applications or databases. IT applications and ETL tools then handle the movement, transformation, integration, and aggregation of data. At the exit point of each IT application or database, the produced information must undergo validation, sharing, and delivery.

At this stage, information has two possible paths:

  1. End users or consumers receive the information for analysis and usage, where it remains unchanged.
  2. The information continues to the next application in the data chain. Upon entering a new application or database, the information is reclassified as “data” since it is subject to transformation and placed into a new context.

This interpretation leads to a challenging conclusion: this model is inherently complex and challenging to apply in daily operations due to its detailed processes and interdependencies.

Therefore, The O.R.A.N.G.E. DMF applies a simplified approach to distinguish “data” and “information” in the “data lifecycle” context. We use “data” when it can still be changed along the data chain. When data reaches the end users or consumers who utilize it without further manipulation, “data” turns into “information.”

Data and Information in the Data Management Community

In the data management community, we don’t use aligned definitions of data and information. Each organization should select its approach. Data and information are commonly used interchangeably, as shown in Figure 4. The industry frameworks and authorities also follow this approach. For example, all capabilities include the term “data” in their titles. Think about data architecture, quality, governance, analytics, etc. However, these capabilities deal with data at various abstraction levels. We will discuss this topic in-depth in Part 3 of this series of articles.

Figure 4: The common approach in defining data.

Figure 4: The common approach in defining data.

Takeaways

Interchangeable use of data and information: The definitions of “data” and “information” are often used interchangeably, creating confusion in their distinction.

Context matters in definitions: The meanings of “data” and “information” vary depending on the context, such as levels of abstraction (conceptual, logical, or physical models) or the stages of the data lifecycle.

Implications for management and governance: This interpretation leads to a critical realization: the definition of data and information will impact the way we define data & information management and governance. For example, data is managed and governed strictly at the lowest IT infrastructure level. Once metadata is added to data, it transforms into information, which is managed and governed at all upper levels (physical, logical, and conceptual). Data and information management and governance is the topic of Part 2 of this series.