Agile Data

Best Practices for Retrieving Objects from Relational Databases

Follow @scottwambler on Twitter!

A common programming task is to retrieve one or more objects, the data for which is stored in a relational database, into memory.  Perhaps you need to display a list of people that work in a department.  Perhaps you need to enable your users to define search criteria used to list available inventory items. Perhaps you need to implement a report. Although these sound like easy tasks there are various architectures, implementation best practices, and ways to represent the results that you need to be aware of. 


1. Architectural Strategies for Retrieving Objects from Relational Databases

For the sake of convenience the term “find strategy” will refer to your implementation strategy for finding the data which objects in relational databases. The deciding factor in choosing a find strategy is the level of database encapsulation that you wish to have. In Encapsulating Database Access you saw that there was four basic approaches for implementing database access – brute force, data access objects (DAOs), persistence frameworks, and services.  Similarly there are different find strategies that you may choose from:

  1. Brute force. With the “brute force” find strategy you simply embed database access code, such as Structured Query Language (SQL) statements or Enterprise JavaBean (EJB) Query Language (EJB QL), in your business objects.  This is the simplest of the three basic approaches and the one taken the most often. It unfortunately suffers from a high level of coupling between your object and database schemas.
  2. Query objects (Brant & Yoder 2000). This is the data access object (DAO) strategy for retrieving objects.  Instead of embedding SQL code in your business objects you instead encapsulate it in separate classes.  A simple approach would implement a single public operation that accepted the criteria and returned a collection of zero or more objects representing the result set. A more sophisticated approach would enable you to work with the query result in a number of different ways, such as collections of objects, as XML documents, or as simple data sets.  The various ways that query results can be represented are discussed later.
  3. Metadata driven. A metadata-driven approach is the most sophisticated strategy available to you and is typically implemented as part of a persistence framework. The basic idea is that you want to decouple your object schema from your data schema and the only way to do this is to describe the mappings between them in meta data instead of in hard coded SQL.  Instead of defining a SQL SELECT statement that specifies the search in terms of database columns your application must instead define the search in terms of the object attributes. Figure 1 overviews how this strategy would work (based on my Design of a Robust Persistence Layer developed in the mid-1990s). The business object submits the meta data for a query, perhaps represented as an XML document or as a full-fledged object, to a query processor. This meta data would represent concepts such as return all customers whose name looks like ‘Sc* A*’, return the account with account number 1701-1234, and return all employees whose hire date is between January 1 1987 and June 14 1995 that work in the marketing department. The query processor passes the query to a query builder that uses the mapping meta data to build a SELECT statement which can then be submitted to the database.  The results come back from the database and are converted into the appropriate representation (such as an XML structure, a collection of objects, and so on). The representation is then returned to the business object.

Figure 1. Overview of the meta data driven strategy.

The primary advantage of this approach is that it enables you to keep your object schema and data schema decoupled from one another.  As long as the query meta data reflects the structure of the current object schema and the mapping meta data is current then you do not need to embed SQL within your object schema to find objects stored in relational databases.

2. Best Practices for Retrieving Objects from Relational Databases

Agile Database Techniques Deciding on, and then implementing, your find strategy is just the first step.  You also need to resolve basic issues such as how to handle errors, how many objects you expect to come back as the result of a find query, when to bring the result across the network, and how to accept search criteria from users. In this section I describe several "best practices" that I have found useful over the years to address these issues.
  1. Use the native error handling strategy.  Things don’t always go right and therefore you need to handle error conditions properly. Languages offer two basic facilities for indicating error: exceptions and return codes.  Languages such as Java and C# support the ability to throw exceptions from operations. 
  2. Expect "logic" errors. You will need a strategy for handling logic errors. When many users can access the database simultaneously, the norm for most applications, logic errors will occur.  These logic errors often represent potential referential integrity problems. 
  3. Always return a collection. A retrieval/find operation should always return a collection, such as a vector or array, as the result.  This is a good strategy because it’s a simple, consistent approach; You can easily determine the size of a collection, simplifying logic error detection because you can determine if there are no objects as the result of your query or several objects when you only expected one; and in languages that don’t support exceptions you can simply use the first element in the collection as the location for the error code/message/object. A more sophisticated approach is to develop a FindResult class that includes a collection containing the result set and the error (if any). This class would have the ability to iterate over the collection, to answer basic queries such as isEmpty() and isSingleResult().   
  4. Use proxies and lazy initialization for search lists. Lazy initialization is an object-level technique for improving system performance via an object proxy.  An object proxy contains just enough information to identify the object within the system, very likely the primary key attributes, and enough information for users to identify the object.  The columns that are displayed on a search result screen often determine the latter information.  The basic idea is that instead of bringing all of the data for every object in the result set you only bring across the identifying information. This information is displayed to the user, who then selects one of the proxies from the list to work with it. The system then retrieves all of the information for the selected business object and enables the user to work with that object.
  5. Use lazy reads for high-overhead attributes. Lazy reading is an attribute-level technique for improving system performance.  The basic idea that the attribute’s value is read from the database, or calculated as the case may be, the first time it is needed instead of setting the value when the object is first retrieved into memory.  Lazy read is a good option when an object’s attribute is high-overhead, perhaps it is very large and would be slow to transmit over the network or it requires intensive calculations to compute, and when it is rarely accessed.
  6. Program for people. When you’re building a search screen your users need some way to indicate their search criteria.  The important thing to remember is that your users very likely aren’t computer professionals, they might be struggling with basic computer literacy, and may not even be comfortable using computers.  Have you ever met someone, told them that you were a software architect for a large and impressive firm, and the only question they have for you is how much memory they should get when they buy a home computer next month?  This person is very likely one of your more advanced users, and they clearly don’t understand what it is that you do for a living. The point is that your search facility will need to be user friendly.  It should follow accepted user interface standards, which your organization should have in place and if they don’t then your team should follow industry standards. 

3. Representing Retrieval Results

Although this may sound like blasphemy to object purists, you don’t always need nor want objects as the result of a search.  The fact is that there are several ways that the results of a find can be represented, as you see in Table 2 which describes the various ways to represent customers.  You don’t need to support all of these representations in your application but you will likely find that you need several.

Table 2. Various ways to represent the results of a find.

Approach Description
Business objects The result set is marshaled into a collection of Customer objects.
Comma Separated Value (CSV) file The result set is marshaled into a text file, with one row in the file for each customer.  Commas separate the column values (e.g. Scott,William,Ambler).
Data structure The result set is marshaled into a collection of data structures. Each customer data structure is typically just a collection of data values.
Data transfer objects

The result set is marshaled into a collect of objects that just contain the data and the getters and setters to access the data.  These objects are serializable.  See Marinescu (2002) and Fowler et. al. (2003) for detailed discussions.
Dataset The result set from the database, as it is returned by your database access library (e.g. JDBC or ADO.NET).
Flat File The result set is marshaled into a text file, with one row in the file for each customer.  The data values are written into known positions (e.g. the first name is written into positions 21 through 40).
Proxies The result set is marshaled into a collection of proxy objects that contain just enough information for both the system and your users to identify the object.
Serialized business objects The result set is marshaled into a collection of business objects. This collection is in turn converted into a single binary large object (BLOB), or another similar format, which can be transmitted across the network as a single entity and then converted back into the original collection of objects by the receiver.
XML document The result set is converted into a single XML document, which will contain zero or more customer structures.

4. Acknowledgements

I'd like to thank Josh Collier for his feedback regarding this article.