Agile Data

Data Quality Strategies: Assessing the Options

Follow @scottwambler on Twitter!

 

The Impact of Poor Data Quality

Results from data quality survey

Whence Data Quality?


Data Quality Strategies

The following strategies can have an impact

  1. Agile architecture envisioning. Initial, high-level modeling performed at the beginning of a project (or programme, or enterprise architecture effort) to identify a viable technical direction for the effort. The goal is to do just enough modeling to drive to the technical vision/strategy, not to create extensive models or detailed documentation. Typically performed in parallel to agile requirements envisioning.
  2. Agile data modeling. An evolutionary (iterative and incremental) and highly collaborative approach to modeling data. Agile data modeling is the act of exploring data-oriented structures in an iterative, incremental, and highly collaborative manner.   Your data assets should be modeled, via an Agile Model Driven Development (AMDD) approach, along with all other aspects of what you are developing.
  3. Agile enterprise architecture.  An evolutionary and highly collaborative approach to modeling, documenting, communicating, and governing architecture for an organization.
  4. Agile master data management. An evolutionary and highly collaborative approach to Master Data Management (MDM).
  5. Continuous database integration. Continuous integration is a development practice where developers integrate their work frequently, at least daily, where the integration is verified by an automated build. The build includes regression testing and possibly static analysis of the code.  Continuous database integration is the act of performing continuous integration on your database assets. Database builds may include the creation of the database schema from scratch, something that you would only do for development and test databases, as well as database regression testing and potential static analysis of the database contents.  
  6. Database refactoring. A process by which an existing database schema is evolved in a safe and effective manner through the application of database refactorings. A database refactoring is a simple change to a database schema, such as renaming a table or splitting a column, which improves the quality of the design without changing the semantics (in a practical manner). There are 60+ proven database refactorings.
  7. Database regression testing. Database testing is the validation of functionality implemented within, and the data values contained within, a database. Database regression testing is database testing done in a regular manner throughout the system development lifecycle, often as part of your continuous integration efforts.
  8. Enterprise data modeling (EDM) Data modeling performed at a cross-system, enterprise/organization-wide level. 
  9. Extract transform load (ETL)
  10. Lean data governance. Adopt a lean approach to data governance. Traditional, command-and-control approaches to data governance appear to work very poorly in practice. The 2006 DDJ survey into the current state of data management practices showed that 66% of development teams will choose to "work around" their organization's data group, and when they do so that 75% of the time it is because they find the data group too difficult to work with, too slow to respond, or that the data group doesn't provide sufficient value to justify the effort of working with them (see Figure 1 below). This is clearly problematic. It is possible to take a lean/agile approach to data governance.
  11. Agile/Lean Data Governance Best Practices
  12. Logical data modeling (LDM). LDMs are used to explore the domain concepts, and their relationships, of your problem domain. This could be done for the scope of a single project or for your entire enterprise. LDMs depict the logical entity types, typically referred to simply as entity types, the data attributes describing those entities, and the relationships between the entities. LDMs are rarely used on projects taking an object-oriented (OO) or Agile approach although often are on traditional projects.
  13. Master data management (MDM). The primary goals of Master Data Management (MDM) are to promote a shared foundation of common data definitions within your organization, to reduce data inconsistency within your organization, and to improve overall return on your IT investment. MDM, when it is done effectively, is an important supporting activity for service oriented architecture (SOA) at the enterprise level, for enterprise architecture in general, for business intelligence (BI) efforts, and for software development projects in general.
  14. Model reviews. A model review, also called a model walkthrough or a model inspection, is a validation technique in which your modeling efforts are examined critically by a group of your peers. The basic idea is that a group of qualified people, often both technical staff and project stakeholders, get together in a room to evaluate a model or document.
  15. Non-solo development. An approach to working where two or more people actively collaborate to fulfill a task. Agile examples include the practices of pair programming and modeling with others.
  16. Physical data modeling (PDM). PDMs are used to design the internal schema of a database, depicting the data tables, the data columns of those tables, and the relationships between the tables. PDMs often prove to be useful on both Agile and traditional projects.
  17. Traditional architecture modeling.
  18. Traditional data governance.
  19. Traditional enterprise architecture.

Figure 1. Reasons why development teams go around data groups.


Risk Factors Surrounding Data Quality Strategies

Look at the complete IT picture, not just the narrow scope promoted by the traditional data management community. Three categories of risk, financial, complexity, and cultural. The cultural risks are particularly important because those are what will often determine the ultimate success of a data quality program. Sadly, they are often ignored.

The following risk factors are important when comparing data quality strategies:

  1. Payback period.
  2. Cost. Initial investment and ongoing.
  3. Organizational complexity.
  4. Technical complexity.
  5. Cultural alignment with development teams.
  6. Cultural alignment with data professionals.

Comparing the Data Quality Strategies


Figure 2. Comparing data quality strategies.


Table 1. Rating the data quality strategies.

Data Quality Strategy Financial Risks Complexity Risks Cultural Risks Assessment
Agile architecture envisioning        
Agile data modeling        
Agile enterprise architecture        
Agile master data management        
Continuous database integration        
Database refactoring        
Database regression testing        
Enterprise data modeling (EDM)        
Extract transform load (ETL)        
Lean data governance        
Logical data modeling (LDM)        
Master data management (MDM)        
Model reviews        
Non-solo development        
Physical data modeling (PDM)        
Traditional architecture modeling        
Traditional data governance        
Traditional enterprise architecture        

A New Vision for Data Quality