This article discuss the common grounds of AI systems and data products & assets.

In the previous article of this series, I demonstrated that various legislations worldwide take significantly different approaches to defining an AI system. One of the reasons I began exploring the need to harmonize data and AI governance is rooted in a core belief of mine: an AI system is fundamentally composed of data and technological infrastructure (TI) assets.

This perspective sparked considerable discussion during a workshop on the topic at the #DA&IQ and #AIGOV conference hosted by @Dataversity. With this in mind, I’d greatly appreciate your thoughts on the subject.

In this article, I will delve into the following topics to substantiate my viewpoint:

  • AI system definitions in various AI-related regulations
  • Data and IT product and asset definitions
  • Commonalities between AI and Data & IT product definitions

AI System Definitions in Various AI-Related Regulations

In many legislative documents discussed in the previous article, AI is described as a “system” or “tools, technologies, and technological models.”

Let’s examine the definition of a “system” provided by the DAMA Dictionary. “A system is an interacting and interdependent group of component items forming a unified whole to achieve a common purpose.” So, using this definition, we can affirm that “tools, technologies, and technological models” can be considered interacting and interdependent system components, as shown in Figure 1.

Two different approaches to defining AI

Picture1: Two different approaches to defining AI.

Based on the analysis of AI definitions in global regulations, an AI system has the following key features:

Autonomy: AI systems operate with various levels of autonomy, making decisions without direct human input.

Processing: They process input data from human or machine sources and abstract it into models to perform automated analysis and make inferences.

Adaptability: AI systems can learn and adapt after deployment to improve their effectiveness.

Outputs: The system provides outputs such as predictions, recommendations, decisions to achieve specific objectives, goal-oriented tasks, and content generation.

All the above leads us to the following definition of AI used in this article:

Artificial intelligence is a system that autonomously performs tasks using machine learning, data processing, and algorithmic models. It can adapt, learn, and improve based on data while achieving specific objectives like prediction, classification, or optimization.

Now, let’s discuss different approaches to defining data and IT assets.

Data and IT Product and Asset Definitions

First, let me state that we don’t have an aligned definition of data in our professional community. I discussed this topic in multiple publications available at the Data Crossroads site.

In my practice, I apply the following definition:

Data is the physical or electronic representation of signals “in a manner suitable for communication, interpretation, or processing by human beings or by automatic means.”

Industry authorities have pretty different viewpoints on the constituent components of a data product, compiled in Figure 2.

Figure 2: The variety of data and IT product definitions.

Figure 2: The variety of data and IT product definitions.

Gartner Definition

Gartner defines a product in the digital business context as “a named collection of business capabilities valuable to a defined customer segment. A product may be just software and data. Alternatively, it may comprise any combination of software, hardware, facilities, and services as required to deliver the entire product experience. A product may be a repeatable service (for example, a subscription service), or it may be a platform (one-sided or multisided).”

This definition includes multiple components and allows maneuvering to define the “data product.”

Forbes Council Definition

The term “data product” originated from data science—at least, I first heard this term from data scientists. My latest investigations demonstrate that no aligned definition of this term exists. According to the Forbes Council, a data product is “a self-contained data container” that directly solves a business problem or is monetized.”

So, in this case, they mean only data, digital or non-digital. A report, dashboard, and data set are examples of a data product in this context.

Zhamak Dehgani Definition

In the data mesh context, Zhamak Dehghani goes further and identifies three components of a data product:

  • Code for pipelines and APIs
  • Data and metadata
  • Infrastructure

So, any organization has many options for defining the data product in a context that fits its needs and data management goals.

Proposed definitions

In the context of this article, I use the following definition of a data product.

A data product is the output of a data-related process, including data, metadata, software, applications, databases, and services. Infrastructure (hardware and networks) are optional components.

We can go further and define a data asset as a collection of data products resulting from data-related processes.

Some companies may split the definition of a data product/asset into an IT one.

Let’s return to the definitions of an AI system and its key components and map them to data & IT assets.

Commonalities between the Definitions of an AI System and Data & IT Product

Product

Defining AI as a system and understanding its key features, discussed in the first paragraph, guides us in identifying the core components of an AI system, as illustrated in Figure 3.

Figure 3: A definition of an AI system mapped to data & IT product definition.

Figure 3: A definition of an AI system mapped to data & IT product definition.

The AI system starts by using input data sourced from humans or machines, which serves as the foundation for the entire process. Whether authentic or synthetic, this data is fed into various AI models that use different AI types and techniques to analyze and transform this data. These models process the data to generate meaningful output data, which can be applied across various use cases—including decision-making, recommendations, predictions, and goal-oriented tasks. Technology platforms enable and support this entire flow, providing the necessary infrastructure and tools to process data, deploy models, and scale AI solutions effectively.

Recalling the definition of data and IT assets discussed in the previous paragraph, we can conclude that all AI system components are data or IT assets.

Takeways

Harmonized governance frameworks: The parallels between AI systems and data/IT products indicate that governance frameworks for data and AI management can be harmonized, or even integrated, to streamline oversight and ensure consistent regulatory compliance.

Data management as the foundation: Data management emerges as a mandatory foundation for any AI-related initiative. Without robust data management, the effectiveness and reliability of AI systems are significantly compromised.

AI as a data-driven process: The analysis underscores that AI systems fundamentally rely on data as their primary input. This interdependency suggests that improvements in data quality and accessibility will directly enhance AI system performance.

The future potential of unified definitions: Establishing aligned definitions for data, IT products, and AI systems can foster international collaboration, reduce regulatory fragmentation, and drive innovation by creating a shared understanding of these interconnected domains.