This is Part 2 of the article.
In Part 1, we discussed challenges associated with data management trends. After that, we demonstrated trends and challenges related to the overall data management capability and its sub-capabilities: governance and business architecture.
In Part 2, we will discuss challenges related to several core and supporting capabilities: data lifecycle management, enterprise architecture, security, data quality, analytics, metadata management, and IT infrastructure, as shown in Figure 1.
Figure 1 demonstrates the data management model used in the “Orange” Data Management Framework.
(the “Orange” data management framework).
Let’s start with the core capability: data lifecycle management.
Data lifecycle management
Establishing and maintaining a data lifecycle is the core value proposition of data management.
A data lifecycle is the “set of processes that move and transform data from the moment of its creation to the moment of its archiving and destruction.”
Data lifecycle management establishes and coordinates these processes.
Several trends related to this capability focus on optimizing a data lifecycle’s processes.
Trend 1: Implementation of DataOps
Like DevOps, DataOps is an agile methodology for developing and maintaining data analytics, integration, and transformation processes.
DataOps aims to improve the speed and accuracy of analytics by streamlining the data pipeline development and data delivery process. DataOps has difficulties with implementation due to growing data volumes.
Trend 2: Development of Data Stack
A data stack refers to the combination of tools, technologies, and platforms used to gather, process, store, and analyze data within an organization.
It can include data sources, databases, data warehouses, data lakes, ETL (Extract, Transform, Load) tools, analytics platforms, visualization tools, and other components that handle an organization’s data needs.
I think the data stack concept is similar to the data fabric concept.
Trend 3: Progress in real-time data integration
Every day, a massive amount of data is generated from various sources companies tend to integrate. Keeping and integrating data in the central repositories for decision-making becomes challenging.
Real-time data integration enables businesses to make decisions based on the most up-to-date information.
Trend 4: Implementation of edge computing
Edge computing is a distributed computing paradigm that brings computation and data storage closer to the data sources, typically at the network’s edge. It substitutes for a centralized cloud-based system.
This approach aims to reduce latency, improve speed, save bandwidth, and thus provide more efficient data processing for real-time analytics.
Trend 5: DaaS introduction
DaaS stands for Data as a Service. It’s a cloud service model where data is made available to users over the internet on a subscription basis. Businesses can access data hosted by third-party providers instead of storing it in their repositories. The key advantage of DaaS is that it allows for easy data accessibility, cost efficiency, and scalability.
Let’s consider the set of supporting capabilities and start with enterprise architecture.
Enterprise architecture
Before demonstrating trends in enterprise architecture, I want to discuss the challenge associated with defining enterprise architecture.
Challenge 1: No aligned definitions of different types of data architecture exist
The TOGAG® Standard recognizes four types of architecture: business, information systems (data and application), and technology. Data architecture is a concept that has multiple definitions in the data management community. Even leading guidelines have different viewpoints.
Let’s take data architecture as an example. The TOGAF® Standard defines data architecture as “A description of the structure of the enterprise’s major types and sources of data, logical data assets, physical data assets, and data management resources.” DAMA-DMBOK2 recognizes data models as the deliverables of data modeling and defines data architecture as “blueprints to guide data integration, control data assets, and align data investments with business strategy.” So, you can see that these guidelines have pretty different views on data architecture. In daily practice, many use the term “data architecture” to combine data, application, and technology architectures.
So, it is your choice to decide which term and its meaning to use.
Let’s discuss several trends related to architecture.
Trend 6: Implementation of data lakes and lakehouses
The first trend is moving from data warehouse solutions to data lake and lakehouse architecture.
A data lake allows for storing raw data of various types and formats. The key challenge is providing and documenting metadata that describes data in the data lake. Often, a company must create a metadata lake to enable the data lake.
Usually, data integration takes place outside of a data lake.
A data lakehouse combines the functionalities of a data warehouse and a data lake.
Trend 7: Usage of derivative and synthetic data
This trend means we operate less and less with raw original data.
Derivative data is derived from original or raw data using processing or transformation. The key aspect is that derivative data does not represent the original raw input but rather some form of processed output.
Synthetic data is data that wasn’t observed initially but was created algorithmically. It is often used in data science and artificial intelligence to augment datasets, especially when original data is scarce, sensitive, or confidential.
Trend 8: Implementation of data domain, data mesh, and data fabric architecture
Let’s discuss several challenges associated with these architectures.
Challenge 2: Data mesh has multiple implementation solutions
The domain-driven design concept by Erik Evans has been adapted by Zhamak Dehghani to the needs of enterprise architecture.
Many companies declare that they want to implement data mesh. However, one of the challenges I’ve seen in practice is that many companies do it differently.
Sometimes, they mix domain architecture with data mesh.
The definitions of data products differ as well. A data product combines data sets, related codes, and infrastructures in the classical data mesh concept. Many companies still limit their data products to a data set at the logical and/or physical data model levels.
Another challenge associated with data mesh is the data scope. In the original concept, data mesh is used only for analytical data. In practice, companies also tend to extend this concept to operational data.
Challenge 3: IT vendors interpret the “data fabric” concept differently
Data Fabric is a unified architecture and set of data services that provide consistent capabilities across data pipelines implemented in on-premises and multiple cloud environments. A single product or platform can hardly implement data fabric because data fabric includes architecture, shared data assets, and data management and integration technology. A couple of months ago, I investigated the functionalities of 168 data management tools. More than 60 providers labeled their tools as “data fabric.” It is challenging to say what they mean by that.
Trend 8: Companies pay more attention to data quality and security
Despite the trend, the data quality maturity level remains low. Figure 2 demonstrates the aggregated results of measuring data quality. These results are based on the anonymous DM maturity scan available at the Data Crossroads site, shared in the “Data Management Maturity Assessment Review 2022.”
As you can see, the overall data quality maturity level has worsened for several years. The number of respondents at two lower levels has increased, while at three higher levels has decreased.
The following supporting capability is data analytics.
Data Analytics
Several trends characterize the progress in data analytics.
Trend 9: Moving towards predictive and prescriptive analytics
Usage of artificial intelligence and machine learning is the key factor that defines trends in data analytics. Companies steadily move from descriptive and diagnostic analytics to predictive and prescriptive ones. During the Amsterdam FP&A (financial planning and analysis) Board, held in October, many FP&A top executives mentioned using AI/ML to analyze and compare competitors’ financial statements to make decisions.
Trend 10: Implementation of augmented analytics
Augmented analytics refers to using machine learning (ML) and artificial intelligence (AI) to automate the data preparation, insight generation, and explanation process within the data analytics domain. It allows for enhancing the analytics process with automated insights and providing sophisticated tools that automatically identify patterns, trends, and anomalies in data sets without requiring users to develop custom models or write complex algorithms.
Trend 11: Extension of self-service analytics
Self-service analytics refers to analytics platforms and tools that enable end-users, often business professionals without a technical background in data science or IT, to access, analyze, and visualize data without the need for direct involvement from data experts or IT staff.
Trend 12: Development of AnalyticsOps
AnalyticsOps is a set of practices, methodologies, and tools aimed at automating, streamlining, and improving the analytics lifecycle, from data preparation and model development to deployment, monitoring, and iteration. AnalyticsOps seeks to address challenges in operationalizing analytics, ensuring that insights and models are created and effectively integrated into business processes and applications.
Metadata management is the foundation of all other DM capabilities. Let’s discuss its key trends.
Metadata management
Metadata is a complex concept that covers various types and structures of metadata objects.
Several trends demonstrate the development in this area of data management.
Trend 13: Implementation of data lineage and knowledge graphs
Multiple IT providers embed these enabling capabilities in their solutions that make data movements and transformations transparent. Many companies tend to document data lineage at multiple abstraction levels. In this respect, data lineage and knowledge graph seem to be synonymous.
Trend 14: Introduction of the “data observability” concept
Data observability refers to the ability to inspect and understand the entire lifecycle of your data, including its origin, transformations, dependencies, quality, and more. The core goal is to optimize data pipelines and make their functioning reliable.
Trend 15: Introduction of active metadata architecture
Active metadata refers to metadata that is not just passively recorded or stored but is actively used to automate, enhance, and drive various data operations and processes. It also means that metadata is managed in real-time. To implement this concept, distributed and hybrid metadata architectures are required.
The last capability we will discuss is information technology and infrastructure.
Information technology and infrastructure
The development of this capability demonstrates several trends.
Trends 16: Companies use more complex deployment options
It means that many companies tend to explore cloud, multi-cloud, and hybrid environments.
Trend 17: Usage of low-code and no-code applications
Low-code and no-code platforms are software development approaches that enable users to create applications through graphical user interfaces and configuration rather than traditional manual coding.
This approach accelerates application development and deployment.
Trend 18: Embedding AI/ML technologies into data management applications
These technologies extend and strengthen functionalities related to data quality, modeling, metadata management, etc.
Trend 19: Changes in database developments
First, multiple companies move from relational to graph databases.
Graph databases more efficiently manage relations, have more flexibility in data model change, and provide intuitive data representation.
The second trend is converging transactional and analytical database technologies.
Until now, databases differed depending on the use case: transactional vs analytical databases. Nowadays, vendors tend to move to a unified database model that serves the needs of various use cases.
Conclusion
Data management is a rapidly developing business area. I am sure that in the coming years, we will face new trends. What we need is to separate the drivers of trends, trends, and associated challenges and apply them to the clearly identified data management capabilities.