Agile Data

Database Refactoring Smells

www.agiledata.org: Techniques for Successful Evolutionary/Agile Database Development

Scott W. Ambler
Refactoring Databases In Refactoring: Improving the Design of Existing Code, Martin Fowler introduced the concept of a “code smell” which is a  common category of problem in your code which indicates the need to refactor it.  Common code smells include switch statements, long methods, duplicated code, and feature envy.  Similarly, there are common database schema smells which indicating the potential need to refactor it.  These smells include:  

 

  1. Multi-purpose column.  If a column is being used for several purposes it is very likely that extra code exists to ensure that the source data is being used the “right way”, often by checking the values of one or more other columns.  An example is a column used to store either someone’s birth date if they’re a customer or their start date if they’re an employee.  Worse yet, you are very likely constrained in the functionality that you can now support, for example, how would you store the birth date of an employee?
  2. Multi-purpose table.  Similarly, when a table is being used to store several types of entities there is likely a design flaw.  An example would be a generic Customer table that is used to store information about both people and corporations.  The problem with this approach is that data structures for people and corporations are different – people have a first, middle, and last name for example whereas a corporation simply has a legal name.  A generic Customer table would have columns which are NULL for some kinds of customers but not others.
  3. Redundant data.  Redundant data is one of many serious problems in operational databases because when data is stored in several places the opportunity for inconsistency occurs.  For example, it is quite common to discover that customer information is stored in many different places within your organization, in fact many companies are unable to put together an accurate list of who their customers actually are.  The problem is that in one table John Smith lives at 123 Main Street and in another table at 456 Elm Street.  In this case this is actually one person who used to live at 123 Main Street but who moved last year, unfortunately John didn’t submit two change of address forms to your company, one for each application which new about him.
  4. Tables with many columns.  When a table has many columns it is indicative that the table lacks cohesion, which it’s trying to store data from several entities.  Perhaps your Customer table contains columns to store three different addresses (shipping, billing, seasonal) or several phone numbers (home, work, cell, …).  You likely need to normalize this structure by adding Address and PhoneNumber tables.
  5. Tables with many rows.  Large tables are indicative of performance problems, for example it’s very time consuming to search a table with millions of rows.  You may want to split the table vertically by moving some columns into another table, or split it horizontally by moving some rows into another table.  Both strategies reduces the size of the table, potentially improving performance.
  6. "Smart” columns.  A “smart column” is one in which different positions within the data represent different concepts.  For example, if the first four digits of the client ID indicate the client’s home branch, then client ID is a smart column because you can parse it to discover more granular information (e.g. home branch ID).  Another example includes a text column used to store XML data structures; clearly you can parse the XML data structure for smaller data fields.  Smart columns often need to be reorganized into their constituent data fields at some point so that the database can easily deal with them as separate elements.
  7. Fear of change.  If you’re afraid to change your database schema because you’re afraid to break something, for example the fifty applications which access it, then that’s the surest sign that you need to refactor your schema.  Fear of change is a very good indication that you have a serious technical risk on your hands, one that will only get worse over time.  My advice is to embrace change.

It is important to understand that just because something smells it doesn’t mean that it’s bad – limburger cheese smells even when it’s perfectly fine.  However, when milk smells bad you know that you have a problem.  If something smells, look at it, think about it, and refactor it if it makes sense.

 

Source

Refactoring Databases

This book describes, in detail, how to refactor a database schema to improve its design. The first section of the book overviews the fundamentals evolutionary database techniques in general and of database refactoring in detail.  More importantly it presents strategies for implementing and deploying database refactorings, in the context of both "simple" single application databases and in "complex" multi-application databases.  The second section, the majority of the book, is a database refactoring reference catalog.  It describes over 60 database refactorings, presenting data models overviewing each refactoring and the code to implement it.

 

 

References and Suggested Online Readings

Agile Database Techniques This book describes the philosophies and skills required for developers and database administrators to work together effectively on project teams following evolutionary software processes such as Extreme Programming (XP), the Rational Unified Process (RUP), the Agile Unified Process (AUP), Feature Driven Development (FDD), Dynamic System Development Method (DSDM), or The Enterprise Unified Process (EUP).  In March 2004 it won a Jolt Productivity award.
The Object Primer 3rd Edition: Agile Model Driven Development (AMDD) with UML 2 This book presents a full-lifecycle, agile model driven development (AMDD) approach to software development.  It is one of the few books which covers both object-oriented and data-oriented development in a comprehensive and coherent manner.  Techniques the book covers include Agile Modeling (AM), Full Lifecycle Object-Oriented Testing (FLOOT), over 30 modeling techniques, agile database techniques, refactoring, and test driven development (TDD).  If you want to gain the skills required to build mission-critical applications in an agile manner, this is the book for you.
   
 

 

 

Let Me Help

I actively work with clients around the world to improve their information technology (IT) practices as both a mentor/coach and trainer.  A full description of what I do, and how to contact me, can be found here

 


Copyright © 2002-2006 Scott W. Ambler

Last updated: February 15, 2006
This site owned by
Ambysoft Inc.
| About This SiteMailing List | Site Map | Contact Me | Suggested Books |