Agile Data

Database Refactoring Smells

Follow @scottwambler on Twitter!

In Refactoring: Improving the Design of Existing Code, Martin Fowler introduced the concept of a "code smell" which is a common category of problem in your code which indicates the need to refactor it. Common code smells include switch statements, long methods, duplicated code, and feature envy. Similarly, there are common database schema smells which indicating the potential need to refactor it. These smells include:
  1. Multi-purpose column. If a column is being used for several purposes it is very likely that extra code exists to ensure that the source data is being used the "right way", often by checking the values of one or more other columns. An example is a column used to store either someone's birth date if they're a customer or their start date if they're an employee. Worse yet, you are very likely constrained in the functionality that you can now support, for example, how would you store the birth date of an employee?
  2. Multi-purpose table. Similarly, when a table is being used to store several types of entities there is likely a design flaw. An example would be a generic Customer table that is used to store information about both people and corporations. The problem with this approach is that data structures for people and corporations are different – people have a first, middle, and last name for example whereas a corporation simply has a legal name. A generic Customer table would have columns which are NULL for some kinds of customers but not others.
  3. Redundant data. Redundant data is one of many serious problems in operational databases because when data is stored in several places the opportunity for inconsistency occurs. For example, it is quite common to discover that customer information is stored in many different places within your organization, in fact many companies are unable to put together an accurate list of who their customers actually are. The problem is that in one table John Smith lives at 123 Main Street and in another table at 456 Elm Street. In this case this is actually one person who used to live at 123 Main Street but who moved last year, unfortunately John didn't submit two change of address forms to your company, one for each application which new about him.
  4. Tables with many columns. When a table has many columns it is indicative that the table lacks cohesion, which it's trying to store data from several entities. Perhaps your Customer table contains columns to store three different addresses (shipping, billing, seasonal) or several phone numbers (home, work, cell, …). You likely need to normalize this structure by adding Address and PhoneNumber tables.
  5. Tables with many rows. Large tables are indicative of performance problems, for example it's very time consuming to search a table with millions of rows. You may want to split the table vertically by moving some columns into another table, or split it horizontally by moving some rows into another table. Both strategies reduces the size of the table, potentially improving performance.
  6. "Smart" columns. A "smart column" is one in which different positions within the data represent different concepts. For example, if the first four digits of the client ID indicate the client's home branch, then client ID is a smart column because you can parse it to discover more granular information (e.g. home branch ID). Another example includes a text column used to store XML data structures; clearly you can parse the XML data structure for smaller data fields. Smart columns often need to be reorganized into their constituent data fields at some point so that the database can easily deal with them as separate elements.
  7. Fear of change. If you're afraid to change your database schema because you're afraid to break something, for example the fifty applications which access it, then that's the surest sign that you need to refactor your schema. Fear of change is a very good indication that you have a serious technical risk on your hands, one that will only get worse over time. My advice is to embrace change.

It is important to understand that just because something smells it doesn't mean that it's bad – limburger cheese smells even when it's perfectly fine.  However, when milk smells bad you know that you have a problem. If something smells, look at it, think about it, and refactor it if it makes sense.


Source

Refactoring Databases This book describes, in detail, how to refactor a database schema to improve its design. The first section of the book overviews the fundamentals evolutionary database techniques in general and of database refactoring in detail. More importantly it presents strategies for implementing and deploying database refactorings, in the context of both "simple" single application databases and in "complex" multi-application databases. The second section, the majority of the book, is a database refactoring reference catalog. It describes over 60 database refactorings, presenting data models overviewing each refactoring and the code to implement it.