Agile Data

Encapsulating Database Access: An Agile "Best" Practice

www.agiledata.org: Techniques for Successful Evolutionary/Agile Database Development

Scott W. Ambler
Agile Database Techniques Encapsulation is a design issue that deals with how functionality is compartmentalized within a system: you should not have to know how something is implemented to be able to use it.  The implication of encapsulation is that you can build anything anyway you want, and then you can later change the implementation and it will not affect other components within the system as long as the interface to that component did not change.  People often say that encapsulation is the act of painting the box black – you are defining how something is going to be done, but you are not telling the rest of the world how you’re going to do it.  For example, consider your bank.  How do they keep track of your account information, on a mainframe, a mini, or a PC?  What database do they use?  What operating system?  It doesn’t matter to you because the bank has encapsulated the way in which they perform account services.  You just walk up to a teller and perform whatever transactions you wish.  

 

In this article I describe the concept of a database encapsulation layer, various encapsulation architectures, and implementation strategies for database encapsulation.  This article is written following the assumption that you are working with relational databases, although if this isn’t the case much of the advice in this article is still appropriate with slight modifications.

 

Table of Contents

  1. Database Encapsulation Layers

  2. Encapsulation Layer Architecture

  3. Implementation Strategies  

  4. Marshalling

  5. Error Handling

  6. References and Recommended Reading

 

1. Database Encapsulation Layers

Figure 1 depicts a typical architectural layering strategy for building business applications which includes a database encapsulation layer (also known as a persistence layer or simply a data layer).  A database encapsulation layer hides the implementation details of your database(s), including their physical schemas, from your business code.  In effect this layer provides your business objects with persistence services – the ability to read data from, write data to, and delete data from – data sources.  Ideally your business objects should know nothing about how they are persisted, it just happens.  Database encapsulation layers aren’t magic and they aren’t academic theories; database encapsulation layers are commonly used practice by both large and small applications as well as in both simple and complex applications.  Database encapsulation layers are an important technique that every agile software developer should be aware of and be prepared to use.

 

Figure 1.  A typical architectural layering strategy for an application.

 

An effective database encapsulation layer will provide several benefits:

There are potentially several disadvantages of database encapsulation layers:

 

2. Encapsulation Layer Architecture

Figure 2 depicts the simplest architecture for encapsulating access to a relational database – a single application working with a single database.  In this situation there is the greatest potential for flexibility as your team should be able to choose the implementation strategy, such as data access objects or a persistence framework, that best fits your situation.  Furthermore you should be in a position to evolve both your object schema and your database schema as you implement new requirements.

 

Figure 2. Single application, single database architecture.

 

A far more realistic situation to be in is Figure 3 that depicts a multiple application, single database architecture.  This architecture is common in organizations that have a centralized legacy database with which all applications work.  Another realistic situation is shown in Figure 4 where there are multiple applications working with multiple databases.  In this case you are likely accessing your "preferred" database(s) as well as one or more legacy data sources.

 

Figure 3. Multi-application, single database architecture.

One interesting observation about both of these diagrams is that some applications may not take advantage of the encapsulation layer(s) and instead directly access data.  There are several reasons for this:

The point is that some applications will be able to take advantage of your encapsulation layer(s) and some won’t.  There are still benefits because you are reducing coupling and therefore reducing your development costs and maintenance burden.

 

Figure 4. Multi-application, multi-database architecture.

Figure 4 makes it clear that some applications already have an encapsulation layer in place.  If this is the case you might want to consider reusing the existing approach instead of developing your own.  By having a single encapsulation layer that all applications use to access all data sources (where appropriate) you potentially reduce the effort it takes to evolve your database schemas via database refactoring because there is only one encapsulation layer to update.  If you’ve purchased the encapsulation layer there is an opportunity to reduce overall licensing fees because you only need to deal with one vendor.  The potential disadvantage is that the team responsible for maintaining the encapsulation layer could become a bottleneck if they are unable or unwilling to work in an agile manner.

 

3. Implementation Strategies For Database Encapsulation Layers

Regardless of whether you intend to purchase, build, or download an a database encapsulation layer it is critical for both Agile DBAs and application developers to understand the various implementation strategies. There are four basic strategies – brute force, data access objects, persistence frameworks, and services – that you should consider using. 

 

3.1 Brute Force

The basic strategy with the brute force approach is that business objects access data sources directly, typically submitting Structure Query Language (SQL) or Object Query Language (OQL) code to the database.  In Java applications this will be done via the Java Database Connectivity (JDBC) class library and via Open Database Connectivity (ODBC) application programming interface (API) in Microsoft-based applications.  Note that Microsoft has newer approaches, such as the ActiveX Data Object (ADO) and Microsoft Data Access Component (MDAC) libraries, which encapsulate and extend ODBC.  Other environments, such as Ruby, have their own native APIs which often take advantage of existing ODBC or JDBC database drivers.

The brute force approach isn’t really a database encapsulation strategy, it is what you do when you don’t have a database encapsulation layer.  However, it is a valid option for database access.  Furthermore it is likely the most common approach because it is simple and provides programmers with complete control over how their business objects interact with the database.  Because of its simplicity this is a very good approach to take at the beginning of a project when your database access requirements are fairly straightforward.  As your database access needs become more complex data access objects or persistence frameworks are likely better options.

 

3.2 Data Access Objects (DAOs)

Data access objects (DAOs) encapsulate the database access logic required of business objects.  The typical approach is for there to be one data access object for each business object, for example the Customer class would have a Customer_Data class.  The Customer_Data class implements the SQL/OQL/… code required to access the database, similar to the brute force approach.  The main advantage of data access objects over the brute force approach is that your business classes are no longer directly coupled to the database, instead the data access classes are.  It is quite common to simply develop your own data access objects, although you may also choose to follow industry-standard approaches such as Java Data Object (JDO) and ActiveX Data Object (ADO). Note that ADO is arguably more of an implementation platform on which to build DAOs within the Microsoft environment, and that it is clearly not as sophisticated than the newer JDO.

 

3.3 Persistence Frameworks

A persistence framework, often referred to as a persistence layer, fully encapsulates database access from your business objects.  Instead of writing code to implement the logic required to access the database you instead define meta data that represents the mappings.  So, if the Customer class maps to the T_Customer table part of the meta data would represent this mapping.  Meta data representing the mappings of all business objects, as well as the associations between them would also need to exist.  Based on this meta data the persistence framework would generate the database access code it requires to persist the business objects.  Depending on the framework, several of which are listed in Table 1 and even more at www.ambysoft.com/essays/persistenceLayer.html, this code is either generated dynamically at run time or it can be generated statically in the form of data access objects which are then compiled into the application.  The first approach provides greater flexibility whereas the second provides greater performance.

 

Table 1. Example products.

Product

Platform

Description

Castor

Java

OSS persistence framework with support for XML, DAX, DSML, caching, two-phase commit, OQL to SQL mappings, ability to create base mapping, support for EJB containers, and the ability to create an XML schema. 

CocoBase Enterprise O/R

Java

Commercial persistence framework that create maps from objects to tables, tables to objects, existing objects with existing tables, existing object models using transparent persistence, and UML / XMI object models.  Generates Java code a variety of approaches data objects, EJB CMP/BMP entity beans, EJB session beans, JSPs, and servlets.  

Deklarit

.NET

Commercial development environment that allows you to describe business objects and rules in a declarative way, with no programming, and it generates the database schema and the ADO.NET strongly typed DataSets and DataAdapters necessary to support them. 

Hibernate Java OSS persistence framework which generates code dynamically at system startup time.  Supports an ODMG 3 interface as well as a custom API.  

JC Persistent Framework

Visual Basic 6.0

OSS persistence framework that manages transactions in a transparent manner to most relational databases.  

Osage Persistence Plus XML

Java

OSS persistence framework that features JDBC-based object-relational mapping that allows experienced Java developers to quickly implement database access in their applications. It generates SQL for retreiving, saving, and deleting objects. Supports object relationships and can automatically generate keys.

Pragmatier Visual Basic 6, .NET Commercial persistence framework/code generator that generates the code for DAO objects with full CRUD + filter/sort capabilities on the MS platforms. Includes development environment that enables you to map data access objects that you create to your existing database or you can let the framework create the database for you (or a mix thereof). Support for distributed transactions and object caching, XML serialization, and traversable data model. 

Versant Enjin

Java

Commercial persistence framework that provides transparent persistence for Java objects within the application server and web server tiers. enJin stores objects transactionally in the middle tier and distributes these objects on demand to local caches in the application and web servers.

Webware for Python

Python

OSS suite of software components for developing object-oriented, web-based applications including an object to relational mapper.

 

Persistence frameworks will have a variety of features.  Simple ones will support basic create, read, update, delete (CRUD) functionality for objects as well as basic transaction and concurrency control.  Advanced features include robust error handling, database connection pooling, caching, XML support, schema and mapping generation capabilities, and support for industry standard technology such as EJB.   

Taking the persistence framework approach the job of an Agile DBA becomes a little more complex but a lot less onerous.  You will be expected to install, if necessary, the persistence framework.  You will also need to work with the administration facility to define and maintain the mapping meta data.  In the case of explicitly controlled persistence frameworks application programmers will need mentoring in the use of the framework, often a very simple task.

 

3.4 Services

For the sake of discussion a service is an operation offered by a computing entity that can be invoked by other computing entities.  At the time of this writing the most popular architectural strategy is web services (McGovern et. al. 2003), however, as you see in Table 2, it is only one of several common strategies available to you:

Services are typically used to encapsulate access to legacy functionality and data, and there is a clear preference within the industry to build new applications following a web services-based architecture to facilitate reuse via system integration.

One way to identify services is overviewed here.  As you can see, it indicates the potential need for business services such as check to see if student exists, verify that a person is eligible to enroll, add applicant to database, and calculate enrollment fees.  To support these services you would need to access the database.  Whether you want to do that directly through DAOs, or through a layer of "pure data services", whatever that implies, is up to you.  Arguably the first and third business service that I listed are mostly data services, at least on the surface.

 

4. Marshalling

Let’s start with some terminology.  Marshalling is the conversion of an object into a data structure such as an XML document or a data set.  Unmarshalling is the conversion of data to objects.  However, it is common to refer to both types of conversion simply as marshalling.  Don’t worry about it. 

Because (un)marshalling occurs at boundary points within your system you need to consider validating the data.  You want to ensure that your data is in a valid state, that individual values conform to business rules (e.g. someone’s age is less than 150) and to referential integrity rules.  The Joy of Legacy Data described many common data quality problems that you may want to try to detect.  There are several data validation issues that you need to consider:

 

5. Error Handling

An important feature of a database encapsulation layer is its ability to handle database errors accordingly.  Whenever the encapsulation layer interacts with a data source there is a potential that an error can occur.  Common types of database-oriented errors include:

The list described above is nowhere near complete, you only have to look at the list of error codes in the manuals for your database to see this, but it is a good start.  The point is that errors happen and you need to be prepared to act on them.  A good encapsulation layer should be able to:

 

6. References and Suggested Online Readings

Agile Database Techniques This book describes the philosophies and skills required for developers and database administrators to work together effectively on project teams following evolutionary software processes such as Extreme Programming (XP), the Rational Unified Process (RUP), the Agile Unified Process (AUP), Feature Driven Development (FDD), Dynamic System Development Method (DSDM), or The Enterprise Unified Process (EUP).  In March 2004 it won a Jolt Productivity award.
Refactoring Databases

This book describes, in detail, how to refactor a database schema to improve its design. The first section of the book overviews the fundamentals evolutionary database techniques in general and of database refactoring in detail.  More importantly it presents strategies for implementing and deploying database refactorings, in the context of both "simple" single application databases and in "complex" multi-application databases.  The second section, the majority of the book, is a database refactoring reference catalog.  It describes over 60 database refactorings, presenting data models overviewing each refactoring and the code to implement it.

 

The Object Primer 3rd Edition: Agile Model Driven Development (AMDD) with UML 2 This book presents a full-lifecycle, agile model driven development (AMDD) approach to software development.  It is one of the few books which covers both object-oriented and data-oriented development in a comprehensive and coherent manner.  Techniques the book covers include Agile Modeling (AM), Full Lifecycle Object-Oriented Testing (FLOOT), over 30 modeling techniques, agile database techniques, refactoring, and test driven development (TDD).  If you want to gain the skills required to build mission-critical applications in an agile manner, this is the book for you.
 

7. Let Me Help

I actively work with clients around the world to improve their information technology (IT) practices as both a mentor/coach and trainer.  A full description of what I do, and how to contact me, can be found here

 


Copyright © 2003-2006 Scott W. Ambler

Last updated: April 8, 2006
This site owned by
Ambysoft Inc.

|
About This SiteMailing List | Site Map | Contact Me | Suggested Books |