Agile Data

The One Truth Above All Else Anti-Pattern

Follow @scottwambler on Twitter!

The "one truth" philosophy says that it is desirable to have a single definition for a data element or business term, that there should be a common, shared definition for your master reference data and perhaps even your major business entities. The "One Truth Above All Else" anti-pattern occurs when this philosophy is taken to the extreme and you seek to get to the one truth about all data entities and data elements within your environment. The challenge is that to get to the "one truth" about something, when it is even possible, often requires significant effort. When this effort goes beyond the point of diminishing returns and provides negative value to your organization you have a significant problem on your hands. 

Seeking the one truth is often an integral part of a master data management (MDM) strategy where you strive to use the most correct and current data available to you. To achieve this you need to manage both the data itself by information about the semantics, source, and quality of the data. This information is often referred to as metadata.  In the past MDM was thought of as primarily an issue for data warehouse (DW)/business intelligence (BI) efforts, but with the growing importance of service oriented architecture (SOA) this is clearly important for mission-critical real-time systems as well.

The goal of MDM is to rationalize the data stored in disparate systems, a difficult but worthy task. Business stakeholders can benefit from consistent data. Mergers and acquisitions often fail when data cannot be reconciled. Support calls concerning data quality problems are avoided when the source problems are actually addressed.


The Problem

"Analysis paralysis" occurs when the team, or more often the data professionals on the team, discover that it's incredibly difficult and time consuming to get people to agree to the "one truth" about their data. This seems to be particularly true of DW/BI projects, perhaps because they touch so many data sources that they very clearly see that there are many versions or interpretations of the same basic concepts.  Sadly, experience seems to show that organizations are spending significant amounts of effort seeking the one truth yet earning little or often negative return on investment (ROI) for that effort. This is because the project teams get hung up trying to get the data perfect instead of focusing on delivering the high-quality working software which provides value to it stakeholders.


The Implications

Taken in moderation attempting to seek the one truth can provide value, but when you take it to the extreme several negative side effects occur:

  1. Your organization's competitive position can be eroded by enforcing a consistent viewpoint. The fact is that various portions of your organization have different ways of working, different priorities, and different constraints. There may not be one single shared truth, and even if there were, it's going to change over time anyway. A great example of this are HSBC advertisements which show two similar pictures and below each picture a different word (Figure 1 is a picture that I took in a hallway in London's Heathrow airport). Then the ad shows the same two pictures again, but with the words switched. The point is that people see the world differently, and that as a financial institution, they understand that and are flexible enough to act accordingly.
  2. The development team abandons the effort. Modern development is evolutionary, not serial, in nature. The rest of IT doesn't have the time to wait for the data professionals to get to the "one truth" before continuing with actual development. If getting to the one truth impacts the project timeline then the development team will often choose to continue on without the data professionals. Sometimes the data modeling effort is abandoned, but more often it will continue in parallel only to see its results ignored by the development team which can no longer use the information provided.
  3. Consensus, not the actual truth, sets in. When people cannot agree to the "one truth" they very often instead agree to disagree and settle on a definition that really doesn't get the job done but at least it doesn't offend anyone too much. 
  4. The gap between the data group and the rest of IT widens yet again.  Extensive data modeling efforts such as this will often appear to be little more than yet another political power grab by the data group. In combination with the challenges resulting from the cultural impedance mismatch, this merely proves to drive in another wedge between the data group and the people whom they're supposed to be supporting.

Figure 1. Questioning the "One Truth" philosophy.


The Solution

"One truth" can be a nice vision to work toward in theory, but in practice you'll likely only be able to narrow it down to several reasonably similar truths. It may be important to recognize that there are several truths and to identify those truths, but trying to force a single consistent truth on all parties is futile at best. Don't let it prevent your team from delivering important business value in a timely manner. My advice is directly related to this: Take a practical approach and recognize that there is a diminishing rate of return when it comes to modeling, and that you can quickly reach the inflection point where further investment in data modeling reduces the overall value to your organization. Once again, the failure rate of traditional data warehousing efforts speak for themselves. Agile Database Techniques
  1. Recognize that the true goal is to deliver business value, not perfect data. The "one truth above all else" anti-pattern often kicks in when people have lost true sight of the overall goal which is to develop high-quality working systems which meet the changing needs of their stakeholders. Data modeling efforts often take on a life of their own, or perhaps it's really a death march of their own, when the one truth becomes the primary goal.
  2. The "one truth" is a moving target, so embrace changeRequirements change over time, sometimes because of changes to the business and technical environment and often because your stakeholders simply didn't understand what they wanted in the first place. The implication is that at best the one truth is a destination which you are always moving towards but one that you'll never actually reach, therefore trying to get it perfectly right up front isn't realistic. Invest some time doing initial requirements envisioning, but recognize that there are swiftly diminishing returns from modeling.
  3. Never rest. Expect entropy of your "truthful data" because data errors will creep into your source data.  An effective database regression testing strategy will of course greatly reduce if not remove this problem, but few organizations seem to have such a strategy in place (or even realize that they need to do so). 
  4. Be flexible defining semantics. Like it or not, there will be a wide range of definitions and uses for the data within your organization and that is perfectly ok. Language is imprecise -- although you should strive to clarify as much as possible the definition of something you'll rarely be able to get a single, perfect answer. The implication is that you should strive to identify the range of acceptable definitions, and hopefully weed out some of the unacceptable ones. Your applications will need to focus on handling the exceptions that are out of bounds when they occur.
  5. Adopt a federated view, not a unified one. Different groups within your organization will have different definitions for data, different ways to work with it, and different priorities. Instead of trying to club them into submission by forcing a single truth upon them, instead try to enable them by supporting different models for each different line of business. I'm not saying that this is easy to do, but I am saying that it's your only viable option in any reasonably complex domain.
  6. Look at the whole picture, not just data. As the first philosophy of the Agile Data method points out, data is only one of many important aspects which you need to consider. Not only do you need to rationalize your data, you also need to rationalize the business logic too.
  7. Pick your battles wisely. Focus first on the data where inconsistencies have the greatest impact on your organization. In other words, do an informal risk assessment on a regular basis and prioritize the work just as you would prioritize business requirements on a software development project. Only through prioritization such as this do you have any sort of hope of maximizing stakeholder ROI within your organization.
  8. Adopt a agile/lean approach to data governance. It is possible to have an effective, streamlined approach to data governance which enables development teams to work with and produce high-quality data assets.

Acknowledgements

I'd like to thank Curt Sampson and Dawn Wolthuis for their feedback regarding this article.