 |
Database Refactoring Smells
www.agiledata.org:
Techniques for Successful Evolutionary/Agile Database Development |
 |
|
|
 |
In
Refactoring: Improving the Design of Existing Code, Martin Fowler introduced the concept of a “code smell” which
is a common category of problem in your code which indicates the need to
refactor it. Common code smells include switch statements, long methods,
duplicated code, and feature envy. Similarly, there are common database schema
smells which indicating the potential need to
refactor
it. These smells include: |
|
- Multi-purpose column. If a column is being used for several
purposes it is very likely that extra code exists to ensure that the source
data is being used the “right way”, often by checking the values of one or
more other columns. An example is a column used to store either someone’s
birth date if they’re a customer or their start date if they’re an
employee. Worse yet, you are very likely constrained in the functionality
that you can now support, for example, how would you store the birth date of
an employee?
- Multi-purpose table. Similarly, when a table is being used to
store several types of entities there is likely a design flaw. An example
would be a generic Customer table that is used to store information
about both people and corporations. The problem with this approach is that
data structures for people and corporations are different – people have a
first, middle, and last name for example whereas a corporation simply has a
legal name. A generic Customer table would have columns which are
NULL for some kinds of customers but not others.
- Redundant data. Redundant data is one of many
serious
problems in operational databases because when data is stored in several
places the opportunity for inconsistency occurs. For example, it is quite
common to discover that customer information is stored in many different
places within your organization, in fact many companies are unable to put
together an accurate list of who their customers actually are. The problem
is that in one table John Smith lives at 123 Main Street and in another
table at 456 Elm Street. In this case this is actually one person who used
to live at 123 Main Street but who moved last year, unfortunately John
didn’t submit two change of address forms to your company, one for each
application which new about him.
- Tables with many columns. When a table has many columns it is
indicative that the table lacks cohesion, which it’s trying to store data
from several entities. Perhaps your Customer table contains columns
to store three different addresses (shipping, billing, seasonal) or several
phone numbers (home, work, cell, …). You likely need to
normalize this structure by adding Address and PhoneNumber
tables.
- Tables with many rows. Large tables are indicative of
performance problems, for example it’s very time consuming to search a table
with millions of rows. You may want to split the table vertically by moving
some columns into another table, or split it horizontally by moving some
rows into another table. Both strategies reduces the size of the table,
potentially improving performance.
- "Smart” columns. A “smart column” is one in which different
positions within the data represent different concepts. For example, if the
first four digits of the client ID indicate the client’s home branch, then
client ID is a smart column because you can parse it to discover more
granular information (e.g. home branch ID). Another example includes a text
column used to store XML data structures; clearly you can parse the XML data
structure for smaller data fields. Smart columns often need to be
reorganized into their constituent data fields at some point so that the
database can easily deal with them as separate elements.
- Fear of change. If you’re afraid to change your database schema
because you’re afraid to break something, for example the fifty applications
which access it, then that’s the surest sign that you need to refactor your
schema. Fear of change is a very good indication that you have a serious
technical risk on your hands, one that will only get worse over time. My
advice is to
embrace
change.
It is important to understand that just because something smells it doesn’t
mean that it’s bad – limburger cheese smells even when it’s perfectly fine.
However, when milk smells bad you know that you have a problem. If something
smells, look at it, think about it, and refactor it if it makes sense.
 |
This book describes, in detail, how to
refactor a database schema
to improve its design. The first section of the book overviews the fundamentals evolutionary database techniques in
general and of database refactoring in detail. More importantly it
presents strategies for implementing and deploying database refactorings, in
the context of both "simple" single application databases and in "complex"
multi-application databases. The second section, the majority of the
book, is a
database refactoring reference catalog. It describes over 60 database refactorings, presenting
data models overviewing each refactoring and the code to implement it.
|
References and Suggested Online Readings
 |
This book describes the philosophies and skills required for
developers and database administrators to work together effectively on
project teams following evolutionary software processes such as Extreme
Programming (XP), the
Rational Unified Process (RUP), the
Agile Unified
Process (AUP), Feature Driven
Development (FDD), Dynamic System Development
Method (DSDM), or The Enterprise Unified Process (EUP). In March 2004
it won a Jolt Productivity award. |
 |
This book presents a full-lifecycle,
agile model driven
development (AMDD) approach to software development. It is one of the
few books which covers both object-oriented and data-oriented development in
a comprehensive and coherent manner. Techniques the book covers
include Agile Modeling (AM),
Full Lifecycle Object-Oriented Testing (FLOOT),
over 30 modeling techniques,
agile database techniques,
refactoring, and
test driven development (TDD). If you want to gain the skills required to
build mission-critical applications in an agile manner, this is the book for
you. |
|
|
|
|
|
I actively work with clients around the world to
improve their information technology (IT) practices as
both a mentor/coach and trainer. A full
description of what I do, and how to contact me, can be
found here.