A practical and pragmatic approach to implementation of data management that delivers quick wins is one of the key challenges of any data management professional. Sooner or later, you will deal with this at one point in your career.

In the series of presentations Practical implementation or optimization of data management with the “Orange” model, I share with you my practical experience of the past 10 years. This experience has led me to developing a new model and practical method for implementation and optimization of data management. This method is a collection of techniques and templates that can be used for performing various tasks related to the development and optimization of data management in your company.

Using the “Orange” model for developing and performing a data management maturity assessment

My experience with data management/governance (DM/DG) roles started more than ten years ago when I first designed and implemented a data management framework. At the time, the topic seemed pretty straightforward. But, the more experience I gained in data management, the more mysterious and complicated the topic of data management/governance roles started to become. A year ago, I already wrote an article on this topic expressing my concerns regarding the common approach and offering some solutions. During the past year, I have discovered some new challenges that should be taken into account while designing roles.

In this article, I would like to share my vision of the challenges with the current common approach, discuss key factors that should be taken into account, and share practices in the development of the set of roles.

Challenges with common approaches

In my opinion, there are several challenges associated with the existing approaches to role design which I have listed in Figure 1:

Figure 1. Challenges with common approaches to design DM/DG roles.

The number of roles

Different publications about DM/DG roles introduce a big ‘zoo’ of the roles. Sadly enough, even publications of the DAMA International present a huge number of roles: 120, to be precise!!! The biggest challenge is the alignment between these roles and processes, the tasks to be delivered, and the artifacts to be produced. The next associated challenge is an unclear relation between roles and enterprise size.

No clear factors of influence

While talking to many data management professionals who implemented DM/DG roles, I often get the impression that they simply copy roles from well-known sources without any analysis of factors that may influence the design pattern of roles. This approach causes the next challenge associated with unaligned names and accountabilities of the role.

Unaligned terms

Once I heard a colleague proudly say: ‘we have implemented roles of stewards and custodians’. Linguistically, the words ‘steward’ and ‘custodian’ are synonymous. This is only one of the examples of blindly copying the sources. It happens because there are no clear guidelines on how to design roles that match the needs and reality of the company.

No clear guidelines to match the roles and companies’ reality

Take, for example, different steward-related roles introduced in DAMA-DMBOK2: ‘data steward’, ‘data custodian’, ‘chief data steward’, ‘business data steward’, ‘coordinating data steward’, ‘executive data steward’, ‘data steward facilitator’, ‘technical data steward’. What are the rules that the company can follow to choose the ‘just enough’ roles of stewards and what is the right context for this ‘zoo’ of stewards?

Let’s take a look at factors that a data management professional should take into account when designing data management/governance (DM/DG) roles.

Key factors that influence design of data management/governance (DM/DG) roles

I will discuss seven key factors that influence the set of DM/DG roles which are shown in Figure 2. If I explain all of these factors in-depth this will become a book rather than an article. You can find more information in the video presentation I did on this subject or contact me for one-on-one advice.

Figure 2. Key factors that influence DM/DG roles design.

Let’s take a brief look at each of these factors.

  1. Types of data stewards

Figure 3. Types of data stewards.

The idea of data stewardship derives from the concept of data ownership. The company, as a whole, owns data. Company delegates data-related tasks to different types of data stewards. I would like to stress that DAMA makes it quite clear that steward and custodian are synonymous. DAMA specifies a data steward as a person or a group of persons that ‘represent the interests of all stakeholders and must take an enterprise perspective to ensure enterprise data is of high quality and can be used effectively’. DAMA also specifies different types of data stewards. Data stewards can be split into three categories depending on their professional background: business, data management, technical. Using this approach, you may assign these roles to every employee who deals with data. Data steward roles can be either formal or virtual. Formal means that you create a new functional role within your organizational structure. A virtual role can be assigned to the already existing functional roles.

  1. Structure of data management (DM) capabilities

In the The “Orange” Model of DM 101 series, I have discussed a set of key data management (DM) capabilities which are data chain, data management framework, data quality, data modeling, information systems architecture. Four dimensions enable each of these capabilities: processes, roles, data, and tools.

Figure 4. The influence of the DM capability structure on DM/DG roles.

The “Orange” model offers to split DM/DG roles along these dimensions.

For example, ‘Business process owner’ and ‘System owner’ relate to the ‘processes’ and ‘tools’ dimensions correspondingly. The ‘Data’ dimension will describe the accountabilities of the data owner/data user roles. The dimension ‘Roles’ will make a clear distribution of roles within the organizational hierarchies. Accountabilities of these roles will simultaneously depend on their location along data chains.

Location along data chains

The data chain describes the path of the transformation of raw data into meaningful information.

In Figure 5, you can see the relationships between the roles as we have just discussed, and their location along data chains.

Figure 5. The distribution of roles along data chains.

Data chains are associated with one or more business processes. Therefore, the business process owner will be accountable for the business process along the data chain that belongs to his accountability. One or more systems and/or applications could be involved in the data processing. Each of these applications will have one application owner. Data owners and data users will be accountable for data. We will discuss their accountabilities later. All the above-mentioned roles will be assigned to business data stewards. Different data management capabilities enable data chains. All types of data stewards will perform processes related to these capabilities and deliver corresponding artifacts. The data architecture of data chains will vary and impact the design of roles.

Data architecture style

Data architecture will influence roles related to data, business processes, and systems. There is a big difference between the canonical and the big data platform architectures as shown in Figure 6.

Canonical architecture

Many companies still have this form of architecture. There are far too many relationships between different sourcing and consuming applications. In professional jargon, they often call such type of architecture ‘spaghetti architecture’.

Figure 6. Different data architecture styles.

Big data platform

Data from different source systems enter the central big data platform. This platform has different data domains. Data is being processed within the platform and then distributed to different users. The key question with such platforms is the location where data will be integrated and transformed. Will it take place within the platform itself or on its way to consumers? The answer to this question will also influence the specification of roles.

Data mesh platform

The data mesh platform is also related to big data architecture. In this case, two different data domain types are being organized: sourcing and consuming. Each domain is a combination of data in the sourcing system and the big data platform or the big data platform and the sourcing system. Within each domain, data is being processed according to the business requirements of this domain.

Different architecture styles will significantly influence the distribution of accountabilities of data owner and data user roles. The architectural style will also affect data modeling and solution design patterns.

Data modeling and solution design

In the canonical approach, data model design and solution design belong to different continuums according to TOGAF 9.2, the leading Enterprise Architecture guide. Enterprise architecture includes four interrelated architectures: business, data, application, and technology. Data architecture delivers conceptual, logical, and physical data models. Solution architecture should implement physical data models into practice. The new approach, on the contrary, unites data model design and solution design. Conceptual, semantic, and solution data models should be designed simultaneously in one process. It means depending on the approach data management and technical data stewards will have different accountabilities and deliverables. It will also affect data management-related processes. This challenge leads to another challenge associated with the definition of business and data domains.

Business and data domain definition

DAMA-DMBOK 2 in assigns the approval authority of a data steward to its domain. The challenge is to specify the definition of ‘domain’.

There are at least three possible approaches to specify domains as shown in Figure 7.

Figure 7. Different approaches to specify the term ‘business/data domain’.

The first approach relates to the concept of new data creation. This is a complex topic that I will cover in one of my master-classes in the future. The key idea looks like the following. Data flows along data chains. On its way, data can either change or not. The first challenge is to specify the conditions of data changes. Usually, master and reference data stay unchanged, while transactional data will be changed. Metadata will ensure the changes in data. Depending on the data type, the accountability of data owners may be specified differently.

The second approach is focused on data content. For example, customer data is a subject area that is usually associated with conceptual models. A company may assign data ownership based on the data subject area. The approach of business architecture to assign data ownership based on business capability domains might be the case as well.

The third approach is less common. It allows specifying data ownership based on organizational structures.

In reality, the combination of these approaches can be used within one enterprise. The scope of the enterprise will also affect the design of roles.

Enterprise scope and/or company size

When a company designs a set of roles, the scope of the data management initiative and the company’s size should be taken into account. It will influence the complexity of the roles represented in the data management organizational structures. Assume a business unit becomes a data owner for specific data sets. Then, within this business unit, the ultimate accountability and corresponding responsibilities for data ownership will be split between business unit manager and staff.

After you have analyzed all of the factors that are relevant for your company, the final step is to make the design of roles.

Design the set of data management/governance roles

My practical advice will be not to copy the already existing solutions and make a set of roles as simple as possible to meet your company’s reality.

The “Orange” model considers data management as a business capability. Four dimensions enable capability: processes, roles, data, and tools. It recommends at the final stage link roles to data management processes and deliverables (data). An example of such a mapping you can see in Figure 8.

Figure 8. An example of the mapping between roles, processes, and deliverables.

For more details, please, consult my book ‘Data management toolkit’.

In the next and final article of the “Orange” Model 101 series, we will discuss how to specify KPIs and measure data management performance.