Relational database management systems (RDBMSs)
often persist mission-critical data which is updated by
many applications and potentially thousands if not millions of end users. Furthermore, they implement important
functionality in the form of database methods (stored procedures, stored functions, and/or triggers) and
database objects (e.g. Java or C# instances). The best way to ensure the continuing quality of these assets, at
least from a technical point of view, you should have a full regression test suite which you can run on a
regular basis.
In this article I argue for a fully automated, continuous regression testing based approach to database
testing. Just as agile software developers take this approach to their application code, see
Agile Testing and Quality Strategies, we should also do the same for our databases.
Table of Contents
-
Why test an RDBMS?
-
What should we test?
-
When should we test?
-
How should we test?
-
Who should test?
-
Introducing database testing into your organization
-
Database testing and data inspection
- Effective practices
- Automated Database Testing In Context
- Related Resources
1. Why Test an RDBMS?
There are several reasons why you need to develop a comprehensive testing strategy for your RDBMS:
-
Data is an important corporate asset. Doesn't it make sense to invest the effort required to
validate the quality of data via effective testing? My July 2006
survey into the current state of data management indicates that 96% of respondents believe that data
is a corporate asset. Yet of them only 40% had a database test suite in place to validate the data and
of those without a test suite only 32% had even discussed the concept.
-
Mission-critical business functionality is implemented in RDBMSs. In the
survey, 63.7% of respondents indicated that their organizations did this, but of those only 46% had
regression tests in place to validate the logic. Shouldn't we be doing better?
-
Current approaches aren't sufficient. The current state of the art in many organizations is for
data professionals to control changes to the database schemas, for developers to visually inspect the
database during construction, and to perform some form of formal testing during the test phase at the
end of the lifecycle. Unfortunately, none of these approaches prove effective. Application developers
will often go around their organization's data management group because they find them too difficult to
work with, too slow in the way they work, or sometimes they don't even know they should be working
together. The end result is that the teams don't follow the desired data quality procedures and as a
result quality suffers. Although visual inspection
of query results is a good start it is little more than a debugging technique in practice that will help
you to find problems but not prevent them. Testing late in the lifecycle is better than nothing, but as
Barry Boehm noted in the early 80s it's
incredibly expensive to fix any defects
you find at that point.
-
Testing provides the concrete feedback required to identify defects. How do you know how good the
quality of your source data actually is without an effective test suite which you can run whenever you
need to?
-
Support for evolutionary development. Many
evolutionary development techniques, in particular
database refactoring, are predicated
upon the idea that it must be possible to determine if something in the database has been broken when a
change has been made. The easiest way to do that is to simply run your regression test suite.
Uncomfortable Question: Isn't it time that we stopped talking about data quality and actually started doing something about it?
Here's a few interesting questions to ask someone who isn't convinced that you need to test the DB:
- If you're implementing code in the DB in the form of stored procedures, triggers, ... shouldn't you test
that code to the same level that you test your app code?
- Think of all the data quality problems you've run into over the years. Wouldn't it have been nice if
someone had originally tested and discovered those problems before you did?
- Wouldn't it be nice to have a test suite to run so that you could determine how (and if) the DB actually
works?
I think that one of the reasons that we don't hear much about database testing is because it is a relatively
new idea within the data community. Many traditional data professionals seem to think that testing is something
that other people do, particularly test/quality assurance professionals, do. This reflects a penchant for
over-specialization and a
serial approach towards development by traditionalists,
two ideas which have also been shown to be questionable organizational approaches at best.
Figure 1 indicates what you should consider testing when it comes to relational
databases. The diagram is drawn from the point of view of a single database, the dashed lines indicate
threat boundaries, indicating that you
need to consider threats both within the database (clear box testing) and at the interface to the database
(black box testing). Table 1
lists the issues which you should consider testing for both internally within the database and at the interface
to it. For details, read the article
What To Test in an RDBMS.
Figure 1. What to test in a database.

Table 1. What to test in an RDBMS.
Black-Box Testing at the Interface |
White/Clear-BoxTesting Internally Within the Database |
- O/R mappings (including the meta data)
- Incoming data values
- Outgoing data values (from queries, stored functions, views ...)
|
- Scaffolding code (e.g. triggers or updateable views) which support refactorings
- Typical unit tests for your stored procedures, functions, and triggers
- Existence tests for database schema elements (tables, procedures, ...)
- View definitions
-
Referential integrity (RI)rules
- Default values for a column
- Data invariants for a single column
- Data invariants involving several columns
|
Agile software developers take a test-first approach to development where they write a test before you write
just enough production code to fulfill that test. The steps of test first development (TFD) are overviewed in
the UML activity diagram of
Figure 2. The first step is to quickly add a test, basically just enough code to fail.
Next you run your tests, often the complete test suite although for sake of speed you may decide to run only a
subset, to ensure that the new test does in fact fail. You then update your functional code to make it pass the
new tests. The fourth step is to run your tests again. If they fail you need to update your functional code and
retest. Once the tests pass the next step is to start over.
Figure 2. The Process of Test First Development (TFD).

Test-driven development (TDD) is an
evolutionary approach to development
which combines test-first development and
refactoring. When an agile software developer goes to implement a new feature, the first question they ask
themselves is "Is this the best design possible which enables me to add this feature?" If the answer is yes,
then they do the work to add the feature. If the answer is no then they refactor the design to make it the best
possible then they continue with a TFD approach. This strategy is applicable to developing both your application
code and your database schema, two things that you would work on in parallel.
When you first start following a TDD approach to development you quickly discover that to make it successful
you need to automate as much of the process as possible? Do you really want to manually run the same build
script(s) and the same testing script(s) over and over again? Of course not. So, agile developers have created
OSS tools such as
ANT,
Maven, and Cruise Control
(to name a few) which enable them to automate these tasks. More importantly, it enables them to automate
their database testing script into the build procedure itself.
Agile developers realize that testing is so important to their success
that it is something they do every day, not just at the end of the lifecycle, following
agile testing strategies. They test as often and early as possible, and better yet they test first. As
you can see with the
agile system development lifecycle (SDLC) of
Figure 3 testing is in fact something that occurs throughout the lifecycle, not just
during the deployment phase. Furthermore, many agile software developers realize that you can test more than
just your code, you can in fact validate every work product created on a software development initiative if you choose to.
Figure 3. The Agile (Project) Lifecycle (click to learn more).

Although you want to keep your database testing efforts as simple as possible, at first you will discover
that you have a fair bit of both learning and set up to do. In this section I discuss the need for various
database sandboxes in which people will test: in short, if you want to do
database testing then you're going to need test databases (sandboxes) to work in. I then overview how to
write a database testand more importantly describe setup strategies for database tests. Finally, I overview several database testing tools which you may want to consider.
A common
strategy on agile teams is to ensure that developers have their own "sandboxes" to work in. A sandbox is basically a
technical environment whose scope is well defined and respected.
Figure 4 depicts the various types of sandboxes which your team may choose to work in. In
each sandbox you'll have a copy of the database. In the development sandbox you'll experiment, implement new
functionality, and refactor existing functionality, validate your changes through testing, and then eventually
you'll promote your work once you're happy with it to the team integration sandbox. In this sandbox you will
rebuild your system and then run all the tests to ensure you haven't broken anything (if so, then back to the
development sandbox). Occasionally, at least once an iteration/cycle, you'll deploy your work to the level (demo
and pre-production testing), and rerun your test suite (including database tests) each time that you do so to
ensure that your changes integrate with the changes made by other developers. Every so often (perhaps once every
six to twelve months) into production. The primary advantage of sandboxes are that they help to reduce the risk
of technical errors adversely affecting a larger group of people than is absolutely necessary at the time.
Figure 4. Sandboxes within your technical environment (click to expand).
There's no magic when it comes to writing a database test, you write them just like you would any other type
of test. Database tests are typically a three-step process:
- Setup the test. You need to put your database into a known state before running tests against it.
There are
several strategies for doing so.
- Run the test. Using a database regression testing tool, run your database
tests just like you would run your application tests.
- Check the results. You'll need to be able to do "table dumps" to obtain the current values in the
database so that you can compare them against the results which you expected.
The article
What To Test in an RDBMS goes into greater detail.
To successfully test your database you must first know the exact state of the database, and the best way to do
that is to simply put the database in a known state before running your test suite. There are two common
strategies for doing this:
-
Fresh start. A common practice is to rebuild the database, including both creation of the schema
as well as loading of initial test data, for every major test run (e.g. testing that you do in your team integration or pre-production test sandboxes).
-
Data reinitialization. For testing in developer sandboxes, something that you should do every
time you rebuild the system, you may want to forgo dropping and rebuilding the database in favor of
simply reinitializing the source data. You can do this either by erasing all existing data and then
inserting the initial data vales back into the database, or you can simply run updates to reset the data
values. The first approach is less risky and may even be faster for large amounts of data.
An important part of writing database tests is the creation of test data. You have several strategies for
doing so:
-
Have source test data. You can maintain an external definition of the test data, perhaps in
flat files, XML files, or a secondary set of tables. This data would be loaded in from the external
source as needed.
-
Test data creation scripts. You develop and maintain scripts, perhaps using data manipulation
language (DML) SQL code or simply application source code (e.g. Java or C#), which does the
necessary deletions, insertions, and/or updates required to create the test data.
-
Self-contained test cases. Each individual test case puts the database into a known state
required for the test.
These approaches to creating test data can be used alone or in combination. A significant advantage of
writing creation scripts and self-contained test cases is that it is much more likely that the developers of
that code will place it under configuration management (CM) control
. Although it is possible to put test data itself under CM control, worst case you generate an export
file that you check in, this isn't a common practice amongst traditional data professionals and therefore
may not occur as frequently as required. My advice is to put all valuable assets, including test data, under
CM control.
Where does test data come from? For unit testing, I prefer to create sample data with known values. This
way I can predict the actual results for the tests that I do write and I know I have the appropriate data
values for those tests. For other forms of testing -- particularly load/stress, system integration, and
function testing, I will use live data so as to better simulate real-world conditions.
Beware Coupling: One danger with database regression testing, and with regression testing in general, is coupling
between tests. If you put the database into a known state, then run several tests against that known
state before resetting it, then those tests are potentially coupled to one another. Coupling between
tests occurs when one test counts on another one to successfully run so as to put the database into
a known state for it. Self-contained test cases do not suffer from this problem, although may be
potentially slower as a result due to the need for additional initialization steps. One way to
ensure that database tests are self contained is to write them as transactions.
4.4 What Testing Tools Are Available?
I believe that
there are several critical features which you need to successfully test RDBMSs. First, as Figure 1
implies you need two categories of database testing tools, one for interface tests and one for internal
database tests. Second, these testing tools should support the language that you're developing in. For
example, for internal database testing if you're a Microsoft SQL Server developer, your T-SQL procedures
should likely be tested using some form of T-SQL framework. Similarly, Oracle DBAs should have a
PL-SQL-based unit testing framework. Third, you need tools which help you to put your database into a
known state, which implies the need not only for test data generation but also for managing that data
(like other critical development assets, test data should be under
configuration management control).
To make a long story short, although we're starting to see a glimmer of hope when it comes to database
testing tools, as you can see in
Table 2, but we still have a long way to go. Luckily there are some good tools being developed by the
open source software (OSS) community and there are some commercial tools available as well. Having said
that, IMHO there is still significant opportunity for tool vendors to improve their database testing
offerings.
Table 2. Some database testing tools.
Category |
Description |
Examples |
Data Privacy Tools |
Data privacy, or more generally information privacy, is a critical issue for many organizations. Many
organizations must safeguard data by law due to regulatory compliance concerns. |
|
Testing tools for load testing |
Tools simulate high usage loads on your database, enabling you to determine whether your system's
architecture will stand up to your true production needs. |
|
Test Data Generator |
Developers need test data against which to validate their systems. Test data generators can be
particularly useful when you need large amounts of data, perhaps for stress and load testing.
|
|
Test Data Management |
Your test data needs to be managed. It should be defined, either manually or automatically (or both),
and then maintained under version control. You need to define expected results of tests and then
automatically compare that with the actual results. You may even want to retain the results of previous test
runs (perhaps due to regulatory compliance concerns). |
|
Unit testing tools |
Tools which enable you to regression test your database. |
|
During development cycles, the primary people responsible for doing database testing are
application developers and
Agile data engineers. They will typically pair together, and because they are hopefully taking a TDD-approach to development the implication is that they'll be doing database unit
testing on a continuous basis. During the release cycle your testers, if you have any, will be responsible for
the final system testing efforts and therefore they will also be doing database testing.
The role of your
data management (DM) group, or IT
management if your organization has no DM group, should be to support your database testing efforts. They should
promote the concept that database testing is important, should help people get the requisite training that they
require, and should help obtain
database testing tools for your organization. As you have seen, database testing is
something that is done continuously by the people on development teams, it isn't something that is done by
another group (except of course for system testing efforts). In short, the DM group needs to support database
testing efforts and then get out of the way of the people who are actually doing the work.
6. Introducing Database Regression Testing into Your Organization
Database testing is new to many people, and as a result you are likely to face several challenges:
- Insufficient testing skills. This problem can be overcome through training, through pairing with
someone with good testing skills (pairing a data engineer without testing skills and a tester without
data engineering skills still works), or simply through trial and error. The important thing is that you
recognize that you need to pick up these skills.
- Insufficient unit tests for existing databases. Few organizations have yet to adopt the practice
of database testing, so it is likely that you will not have a sufficient test suite for your existing
database(s). Although this is unfortunate, there is no better time than the present to start writing
your test suite.
- Insufficient database testing tools. As I said earlier, we still have a way to go with respect to
tools.
- Reticent DM groups. My experience is that some data management (DM) groups may see the
introduction of database regression testing, and agile techniques such as test-first
development (TFD) and
refactoring, as a threat. Or, as my July 2006 "state of data management"
survey shows, a large percentage of organizations are not only not doing any database testing at all
they haven't even discussed it. For many in the data management community the idea of doing database
testing is rather new and it's simply going to take a while for them to think it through. I'm not so
sure that you should wait to do such obvious
process improvement.
In general, I highly suggest that you read my article
Adopting Evolutionary/Agile Database Techniquesand consider buying the book
Fearless Change
which describes a pattern language for successfully implementing change within organizations.
A common quality technique is to use data inspection tools to examine existing data within a database. You
might use something as simple as a SQL-based query tool such as DB Inspect
to select a subset of the data within a database to visually inspect the results. For example, you may
choose to view the unique values in a column to determine what values are stored in it, or compare the row count
of a table with the count of the resulting rows from joining the table with another one. If the two counts are
the same then you don't have an RI problem across the join.
As Richard Dallaway
points out, the problem with data inspection is that it is often done manually and on an irregular basis. When
you make changes later, sometimes months or years later, you need to redo your inspection efforts. This is
costly, time consuming, and error prone.
Data inspection is more of a debugging technique than it is a testing technique. It is clearly an important
technique, but it's not something that will greatly contribute to your efforts to ensure data quality within
your organization.
I'd like to share a few database testing "best
practices" with you:
-
Use an in-memory database for regression testing. You can dramatically speed up your database
tests by running them, or at least portions of them, against an in-memory database such as
HSQLDB. The challenge with this approach is that because database
methods are implemented differently across database vendors that any method tests will still need to run
against the actual database server.
-
Start fresh each major test run. To ensure a clean database, a common strategy is that at the
beginning of each test run you drop the database, then rebuild it from scratch taking into account all
database refactorings and
transformations to that point, then reload the test data, and then run your tests. Of course, you
wouldn't do this to your production database. ;-)
-
Take a continuous approach to regression testing. I can't say this enough, a
TDD approach to development is an incredibly
effective way to work.
-
Train people in testing. Many developers and date engineers have not been trained in testing
skills, and they almost certainly haven't been trained in database testing skills. Invest in your
people, and give them the
training and education they need to
do their jobs.
-
Pair novices with people that have database testing experience. One of the easiest ways to gain
database testing skills is to pair program with someone who already has them.
9. Automated Database Testing in Context
The following table summarizes the trade-offs associated with automated database testing and provides advice
for when (not) to adopt it.
Advantages
|
- Enables inclusion of data sources in automated testing strategy
- Enables automatic enforcement of data standards and conventions
- Support executable specification strategy
|
Disadvantages
|
- Requires people with test thinking and testing skills.
- Requires an understanding that data sources are enterprise assets AND must be treated as such.
- Traditional data groups are often unwilling at first to include testing as one of their
responsibilities.
- Lack of automated tests for existing legacy data sources requires investment to develop them
|
When to Adopt This Practice
|
Any data source that is considered an enterprise asset should have an automated regression test suite.
Otherwise it isn't really an asset, is it?
|
10. Related Resources