Agile Data

Data Technical Debt: How to Address Quality Problems in Data Sources

Follow @scottwambler on Twitter!

This article explores the concept of data technical debt, which refers to quality challenges associated with legacy data sources. Data technical debt impedes the ability of your organization to leverage information effectively for better decision making, increases operational costs, and impedes your ability to react to changes in your environment.

This article is organized into several topics:

  1. Defining technical debt
  2. Data technical debt
  3. Why data technical debt is important
  4. Types of data technical debt
  5. Strategies for avoiding data technical debt
  6. Strategies for removing data technical debt
  7. Strategies for accepting data technical debt
  8. Related Resources

1. Defining Technical Debt

Technical debt refers to the implied cost of future refactoring or rework to improve the quality of an asset to make it easy to understand, work with, maintain, and extend. The concept of technical debt was first proposed by Ward Cunningham to describe the impact of poor quality code on your overall software development efforts. Since then people have extended the metaphor to other types of debt. Examples of technical debt include, but are not limited to:


2. Data Technical Debt

Let's define a few important terms:


3. Why Data Technical Debt is Important

When we have significant data technical debt we face several challenges:
  1. Longer time to market. It is much more difficult to work with low-quality data sources than high-quality ones. This is due to increased effort to understand the data sources, to evolve them, and then to test them to ensure they still work as expected.
  2. Increased cost. The increased time to work with lower-quality data sources results in increased cost to do so.
  3. Unpredictability. Because most data technical debt is hidden it becomes difficult to predict how much effort it will be to work with, and to evolve, existing data sources because you often do not know how big the mess really is until you at least investigate the situation. Even when you do that you always run into unexpected problems when you start into the actual work.
  4. Poor decision support. Poor quality data, either due to inconsistencies, lack of timeliness, inaccuracies, or many other issues (see below)
  5. Decreased collaboration. An indirect problem with technical debt is that it can decrease collaboration between teams, which is unfortunate because data technical debt often requires cross-functional collaboration to remove. This decreased collaboration is often the result of "finger pointing," the developers didn't work with the source of record, the data people were too hard to work with, this team didn't keep the documentation up to date, and so on.

4. Types of Data Technical Debt

There are several categories of data technical debt to consider, summarized in Table 1.

Table 1. Types of data technical debt.

Type

Examples

Structural. Quality issue with the design of a table, column or view.
  • Extra view, column, ...
  • Improperly named column, table, ...
  • Improperly split table
  • Insufficiently normalized operational data
  • Overly normalized reporting data
Data quality. Quality issue with the consistency or usage of data values.
  • Null business key value
  • Duplicated business key value
  • Different key values for the same business entity AND traceability isn't maintained
  • Corrupted data
Referential integrity. Quality issue with whether a referenced row exists within another table and that a row which is no longer needed is (soft) deleted appropriately.
  • Dropped or missing triggers
  • Inconsistent value in calculated column
  • Missing foreign key constraints
  • Null foreign key values
Architectural. Quality issue with how external programs interact with a data source.
  • Data-intensive calculation outside of database
  • Inappropriate encapsulation
  • Inappropriate security access control
  • Missing index
Documentation. Quality issue with any supporting documents, including models.
  • Difficult to navigate or find
  • Inconsistent information
  • Outdated information
  • Overly-detailed information
  • Missing information
  • Static, non-executable, documentation
Method/functional. Quality issue with execution aspects within a data source, such as stored procedures, stored functions, and triggers.
  • Inconsistent naming conventions
  • Incorrect calculation
  • Overly complex code
  • Poorly named trigger or stored procedure
  • Slow calculation, procedure, ...

5. Strategies for Avoiding Data Technical Debt

In Disciplined Agile (DA) we include several explicit opportunities that enable you to avoid technical debt. They are:

  1. Initial conceptual modeling. Figure 1 depicts the process goal diagram for DA's Explore Scope process goal. The explore the domain decision point focuses on modeling the data that is used by a system, enabling a solution delivery team to have a better understanding of their data-oriented requirements. This work occurs early in a project, during what DA calls the Inception phase.
  2. Initial architectural modeling. Figure 2 depicts the process goal diagram for DA's Identify Architecture Strategy process goal. The model business architecture decision point focuses on the data architecture aspects of a solution, enabling the delivery team to work through how what what data sources their solution will work with. This work also occurs during Inception, putting a team in a position to avoid data technical debt by thinking through their architecture before they begin Construction.
  3. Continuous modeling. Figure 3 depicts the process goal diagram for DA's Produce a Potentially Consumable Solution process goal. The explore solution design decision point includes explicit agile design strategies, including both model-driven and test-driven approaches, that are applicable to all aspects of your solution design, including data. Thinking through your design, even on a just-in-time (JIT) basis, will reduce the chance of injecting new technical debt into your data sources.

Figure 1. Disciplined Agile's Explore Scope process goal (click to enlarge).

Explore Scope process goal

Figure 2. Disciplined Agile's Identify Initial Architecture process goal (click to enlarge).

Identify Architecture Strategy process goal

Figure 3. Disciplined Agile's Produce a Potentially Consumable Solution process goal (click to enlarge).

Produce a Potentially Consumable Solution

6. Strategies for Removing Data Technical Debt

There are several aspects of technical debt, and several strategies available to you to remove each one. The Disciplined Agile (DA) tool kit includes an Improve Quality process goal for removing technical debt. The goal diagram is shown in Figure 5.

Figure 5. Disciplined Agile's Improve Quality process goal (click to enlarge).

Disciplined Agile's Improve Quality Process Goal

As you can see, the Improve Quality process goal indicates there are are four decision points (aspects) that you want to address. Each decision point provides options for removing data technical debt:

  1. Improving data source implementation. When existing data sources have quality problems there are several options for fixing them, from safely refactoring them to the usually riskier option of rewriting them.
  2. Improving deliverable documentation. As you learned earlier the quality of your documentation, in particular documentation describing data sources and how to work with them, are an important aspect of your overall quality. There are several strategies for improving deliverable documentation.
  3. Improving data source format. The consistency of your naming conventions, field formats, data values, and other formatting aspects can also be improved and thereby address data technical debt.
  4. Reusing existing data sources. Greater reuse motivates investment in quality, and the production of high-quality assets motivates greater reuse of those assets. Do you want people to use existing sources of record? Then ensure those sources of record are high-quality and easy to work with.

7. Strategies for Accepting Data Technical Debt

One strategy is to accept technical debt. The team makes a conscious decision to not avoid/remove technical debt at the current time which, as you can see in the technical debt quadrant of Figure 4, is a valid option. This is a decision that should be led by the architecture owner and confirmed by the product owner.

Figure 4. Martin Fowler's technical debt quadrant, modified for data technical debt.


Reckless
Prudent



Deliberate

The data architects are too difficult to work with.

We don't have time to understand existing data sources.


We must ship now and deal with the consequences later.



Inadvertent

What is data normalization?

What is a "source of record"?

We have data conventions?


Now we know how we should have designed that data source.


8. Related Resources