Agile Data

Relational Databases 101: Looking at the Whole Picture

Follow @scottwambler on Twitter!

For the sake of our discussion a relational database is a persistent storage mechanism that enables you to both store data and optionally implement functionality. The goal of this article is to overview relational database management system (RDBMS) technology and to explore the practical issues applicable to its use in modern organizations; the goal is not to discuss relational theory. RDBMSs are used to store the information required by applications built using procedural technologies such as COBOL or FORTRAN, object technologies such as Java and C#, and component-based technologies such as Visual Basic. Because RDBMSs are the dominant persistent storage technology it is critical that all IT professionals understand at least the basics of RDBMSs, the challenges surrounding the technology, and when it is appropriate to use them.

Table of Contents

  1. RDBMS Technology
  2. Coupling: Your Greatest Enemy
  3. Additional Challenges With RDBMSs
  4. Encapsulation: Your Greatest Ally
  5. Beyond RDBMSs: You Actually Have A Choice
  6. Closing Thoughts

1. RDBMS Technology

Let's start by defining some common terminology. A database management system (DBMS) is the software which controls the storage, retrieval, deletion, security, and integrity of data within a database. An RDBMS is a DBMS which manages a relational database. A relational database stores data in tables. Tables are organized into columns, and each column stores one type of data (integer, real number, character strings, date, …).  The data for a single “instance” of a table is stored as a row.  For example, the Customer table would have columns such as CustomerNumber, FirstName, and Surname, and a row within that table would be something like {1701, “James”, “Kirk”}. 

Tables typically have keys, one or more columns that uniquely identify a row within the table, in the case of the Customer table the key would be CustomerNumber.  To improve access time to a data table you define an index on the table.  An index provides a quick way to look up data based on one or more columns in the table, just like the index of a book enables you to find specific information quickly. 


1.1 Simple Features of RDBMSs

The most common use of RDBMSs is to implement simple CRUD – Create, Read, Update, and Delete – functionality.  For example an application could create a new order and insert it into your database. It could read an existing order, work with the data, and then update the database with the new information. It could also choose to delete an existing order, perhaps because the customer has cancelled it.  The vast majority of your interaction with an RDB will likely be to implement basic CRUD functionality.

The easiest way to manipulate a relational database is to submit Structured Query Language (SQL) statements to it.  Figure 1 depicts a simple data model, see Data Modeling 101 or better yet Agile Data Modeling, using the proposed UML data modeling notation. To create a row in the Seminar table you would issue an INSERT statement, an example of which is shown in Code Figure 1.  Similarly, Code Figure 2 presents and example of how to read a row by issuing a SELECT statement. Code Figure 3 shows how to update an existing row via an UPDATE statement and Code Figure 4 how to delete a row with a DELETE statement.  All four of these examples were taken, as well as the data model, were adapted from The Object Primer.  A very good resource for learning SQL is SQL Queries for Mere Mortals by Micheal J. Hernandez and John L. Viescas.

    

Code Figure 1. SQL statement to insert a row into the "Seminar" table.

INSERT INTO Seminar

       (SEMINAR_ID, COURSE_ID, OVERSEER_ID, SEMINAR_NUMBER)

       VALUES (74656, 1234, ‘THX0001138’, 2)

 

Code Figure 2. SQL statement to retrieve a row from the "Seminar" table.

SELECT * FROM Seminar

       WHERE SEMINAR_ID = 1701

 

Code Figure 3. SQL statement to update a row in a table in the "Seminar" table.

UPDATE Seminar

       SET OVERSEER_ID = ‘NCC0001701’, SEMINAR_NUMBER = 3

       WHERE SEMINAR_ID = 1701

 

Code Figure 4. SQL statement to delete data from the "Seminar" table.

DELETE FROM Seminar

WHERE SEMINAR_ID > 1701

       AND OVERSEER_ID = ‘THX0001138’

 

Figure 1. A simple UML data model.

 

1.2 Advanced Use of RDBMSs

There are several “advanced features” of RDBMSs that developers learn once they’ve familiarized themselves with basic CRUD functionality. Each of these features is so important, and often so complex, that they require their own articles to cover them properly. So for now I will introduce you to the concepts and then link to these other articles for the details. These features include:

  1. Object storage.  To store an object in a relational database you need to flatten it – create a data representation of the object – because relational databases only store data. To retrieve the object you would read the data from the database and then create the object, often referred to as restoring the object, based on that data. Although storing objects in a relational database sounds like a simple thing to achieve, practice shows that it isn’t. This is due to the object-relational impedance mismatch, the fact that relational database technology and object technology are based on different underlying theories, a topic discussed in The Object-Relational (O/R) Impedance Mismatch.  To store objects successfully in relational databases you need to learn how to map your object schema to your relational database schema.
  2. Implementing behavior within the database.  Behavior is implemented in a relational database via stored procedures and/or stored functions that can be invoked internally within the database and often by external applications.  Stored functions and procedures are operations that run within an RDBMS, the difference being what the operation can return and whether it can be invoked in a query. The differences aren’t important for our purposes so the term “stored procedure” will be used to refer to both stored functions and stored procedures.  In the past stored procedures were written in a proprietary language, such as Oracle’s PL/SQL, although now Java is quickly becoming the language of choice for database programming.  A stored procedure typically runs some SQL code, massages the data, and then hands back a response in the form of zero or more records, or a response code, or as a database error message.  Effective use of stored procedures is discussed in detail in Implementation Strategies for Persisting Objects in RDBs.
  3. Concurrency control.  Consider an airline reservation system.  There is a flight with one seat left on it, and two people are trying to reserve that seat at the same time. Both people check the flight status and are told that a seat is still available.  Both enter their payment information and click the reservation button at the same time. What should happen? If the system is working properly only one person should be given a seat and the other should be told that there is no longer one available.  Concurrency control is what makes this happen.  Concurrency control must be implemented throughout your object source code and within your database.
  4. Transaction control.  A transaction is a collection of actions on your database – such as the saving of, retrieval of, or deletion of data – which form a work unit.  A flat transactions is an “all-or-nothing” approach where all the actions must either succeed or be rolled back (canceled).  A nested transaction is an approach where some of the actions are transactions in their own right. These sub-transactions are committed once successful and are not rolled back if the larger transaction fails. Transactions may also be short-lived, running in thousandths of a second, or long-lived, taking hours, days, weeks, or even months to complete.  Transaction control is a critical concept for all developers to understand.
  5. Enforcing referential integrity. Referential integrity (RI) is the assurance that a reference from one entity to another entity is valid. For example, if a customer references an address, then that address must exist. If the address is deleted then all references to it must also be removed, otherwise your system must not allow the deletion. Contrary to popular belief, RI isn’t just a database issue, it’s an issue for your entire system. A customer is implemented as an object within a Java application and as one or more records in your database – addresses are also implemented as objects and as rows.  To delete an address, you must remove the address object from memory, any direct or indirect references to it (an indirect reference to an address would include a customer object knowing the value of the AddressID, the primary key of the address in the database), the address row(s) from your database, and any references to it (via foreign keys) in your database. To complicate matters, if you have a farm of application servers that address object could exist simultaneously on several machines.  Furthermore, if you have other applications accessing your database then it is possible that they too have representations of the address in their memory as well. Worse yet, if the address is stored in several places (e.g. different databases) you should also consider taking this into account. All developers should understand basic strategies for implementing referential integrity.
Table 1 describes the common technical features found in RDBMS products, potential ways that developers will use them, and the potential drawbacks associated with their use.  

Table 1. Common RDBMS Technical Features.

Feature

Potential Usage

Potential Drawbacks

Database cursors – A database cursor is effectively a handle to the results of a SQL query, enabling you to move forward and backward through the result set one or more records at a time.

  • Accessing large results sets in smaller portions enables your application to display initial results earlier, increasing response time.
  • Performance is improved when a portion of a result set is required because less data is transmitted across the network.
  • Application developers need to understand that the underlying data can change between the times that data records are accessed via the cursor: previously retrieved records may have been deleted, records may have been inserted into previously retrieved portions of the result set, or previously retrieved records may have been modified.
  • Not all cursors are created equal.  Some cursors only allow forward scrolling.
  • Cursors are a resource drain on the database because they are memory intensive.

Java – Most database vendors support a Java VM within the database.

  • Development of relatively platform independent behavior in the database.
  • Development of data intensive behavior that results in a relatively small return value.
  • Encapsulation of database access to support security access control to information.
  • Implementation of shared behavior required by many applications.
  • Different version of VMs between application server and database server increases complexity of development.
  • Behavior implemented in the database can easily become a bottleneck.

Triggers – A trigger is a procedure that is run either before or after an action (such as a create, update, or delete) is performed on a row in a database table.

  • Enforce referential integrity within your database.  These types of triggers can often be automatically generated by your data modeling or database administration tool.

  • Often a lowest common denominator for implementing referential integrity constraints.

  • Perform hand-crafted audit logging.

  • Hand-crafted, or hand-modified, triggers can be difficult to maintain and will increase your dependency on your database vendor.

  • Triggers are typically implemented in a proprietary language requiring an extra skillset on your team.

  • Because triggers are automatically invoked they can be very dangerous (such as “uncontrolled” cascading deletions resulting from chained delete triggers).

  • Behavior implemented in the database can easily become a bottleneck if your database doesn’t scale well.



2. Coupling: Your Greatest Enemy

Coupling is a measure of the degree of dependence between two items – the more highly coupled two things are the greater the chance that a change in one will require a change in another.  Coupling is the “root of all evil” when it comes to software development, and the more things that your database schema is coupled to the harder it is to maintain and to evolve.  Relational database schemas can be coupled to:

  1. Your application source code. When you change your database schema you must also change the source code within your application that accesses the changed portion of the schema.   Figure 2 depicts the best-case scenario – when it is only your application code that is coupled to your database schema. This situation is traditionally referred to as a stovepipe. These situations do exist and are often referred to as stand-alone applications, stovepipe systems, or greenfield projects.  Count yourself lucky if this is your situation because it is very rare in practice.
  2. Other application source codeFigure 3 depicts the worst-case scenario for relational databases – a wide variety of software systems are coupled to your database schema, a situation that is quite common with existing production databases.  It is quite common to find that in addition to the application that your team is currently working on that other applications, some of which you know about and some of which you don’t, are also coupled to your database.  Perhaps an online system reads and writes to your database.  Perhaps a manager has written a spreadsheet, unbeknownst to you, that reads data from your database that she uses to summarize information critical to her job. 
  3. Data load source code.  Data loads from other sources, such as government provided tax tables or your own test data, are often coupled to your database schema.
  4. Data extract source code.  There may be data extraction scripts or programs that read data from your database, perhaps to produce an XML data file or simply so your data can be loaded into another database.
  5. Persistence frameworks/layers.  A persistence framework encapsulates the logic for mapping application classes to persistent storage sources such as your database.  When you refactor your database schema you will need to update the meta data, or the source code as the case may be, which describes the mappings. 
  6. Itself. Coupling exists within your database. A single column is coupled to any stored procedure that references it, other tables that use the column as a foreign key, any view that references the column, and so on. A simple change could result in several changes throughout your database. 
  7. Data migration scripts.  Changes to your database schema will require changes to your data migration scripts.
  8. Test code.  Testing code includes any source code that puts your database into a known state, that performs transactions that affect your database, and that reads the results from the database to compare it against expected results.  Clearly this code may need to be updated to reflect any database schema changes that you make. 
  9. Documentation.  Some of the most important documentation that you are likely to keep pertains to your physical database schema, including but not limited to physical data models and descriptive meta data.  When your database schema changes the documentation describing will also need to change accordingly. Although Agile Modeling implores you to Update Only When It Hurts, that your documentation doesn’t have to be perfectly in synch with your schema at all times, the reality is that you will need to update your docs at some point.
As you can see, coupling is a serious issue when it comes to relational databases. To make matters worse the concept of coupling is virtually ignored within database theory circles. Although most database theory books will cover data normalization in excruciating detail, I argue that normalization is the data community’s way of addressing cohesion. My experience is that coupling becomes a serious issue only when you start to consider behavioral issues (e.g. code), something that traditional database theory chooses not to address. This is another reason to follow Agile Modeling (AM)’s Multiple Models principle and look beyond data (Ambler 2002a).

Figure 2. The best-case scenario.

Figure 3. The worst-case scenario.



3. Additional Challenges With RDBMSs

Coupling isn’t the only challenge that you face with RDBMSs, although it is clearly an important one.  Other issues that you will face include:

  1. Performance issues are difficult to predict.  When you are working with a shared database, such as the situation implied in Figure 3, you may find that the performance characteristics of your database are hard to predict because each application accesses the database in its own unique way.  For example, perhaps one legacy application updates information pertaining to items for sale sporadically throughout the month, enabling a human operator to add new items or update existing ones, activity that doesn’t really affect your application’s performance in a meaningful way. However, this same application also performs batch loads of items available for sale via other companies that you have partnered with, items that you want to carry on your web site as soon as they are available.  These batch loads can take several minutes, during which period the Item table is under heavy load and thus your online application is potentially affected.
  2. Data integrity is difficult to ensure with shared databases. Because no single application has control over the data it is very difficult to be sure that all applications are operating under the same business principles.  For example, your application may consider an order as fulfilled once it has been shipped and a payment has been received.  The original legacy application that is still in use by your customer support representatives to take orders over the phone may consider an order as fulfilled once it has been shipped, the payment received, and the payment deposited into your bank account.  A slight difference in the way that a fundamental business rule has been implemented may have serious business implications for any application that accesses the shared databases. Less subtly, imagine what would happen if your online order taking application calculates the total for an order and stores it in the order table, whereas the legacy application calculates the subtotals only for order items but does not total the order. When the order fulfillment application sees an order with no total it calculates the total, and appropriate taxes, whereas if a total already exists it uses the existing figure. If a customer makes an order online and then calls back a few hours later and has one of your customer service representatives modify their existing order, perhaps to add several items to it, the order total is no longer current because it has not been updated properly.  Referential integrity issues such as this are covered in detail in the Implementing Referential Integrity article.
  3. Operational databases require different design strategies than reporting databases.  The schemas of operational databases reflect the operational needs of the applications that access them, often resulting in a reasonably normalized schema with some portions of it denormalized for performance reasons. Reporting databases, on the other hand, are typically highly denormalized with significant data redundancy within them to support a wide range of reporting needs.  
Every technology has its strengths and weaknesses, and RDBMS technology is not an exception to this rule.  Luckily there are ways that you can mitigate some of these challenges, and encapsulation is an important technique to do so.


4. Encapsulation: Your Greatest Ally

Encapsulation is a design issue that deals with how functionality is compartmentalized within a system. You should not have to know how something is implemented to be able to use it. The implication of encapsulation is that you can build anything anyway you want, and then you can later change the implementation and it will not affect other components within the system (as long as the interface to that component did not change). People often say that encapsulation is the act of painting the box black – you are defining how something is going to be done, but you are not telling the rest of the world how you’re going to do it.  For example, consider your bank. How do they keep track of your account information, on a mainframe, a mini, or a PC?  What database do they use? What operating system? It doesn’t matter to you because the bank has encapsulated the way in which they perform account services. You just walk up to a teller and do whatever transactions you wish.

By encapsulating access to a database, perhaps through something as simple as data access objects or something as complex as a persistence framework, you can reduce the coupling that your database is involved with.  The Implementation Strategies for Persisting Objects in RDBs chapter compares and contrasts various encapsulation strategies that you have available to you. For now assume that it is possible to hide the details of your database schema from the majority of the developers within your organization while at the same time giving them access to your database. Some people, often just the Agile DBA(s) responsible for supporting the database, will need to understand and work with the underlying schema to maintain and evolve the encapsulation strategy. 

One advantage of encapsulating access to your database is that it enables application programmers to focus on the business problem itself.  Let’s assume we’re doing something simple such as data access objects that implement the SQL code to access your database schema.  The application programmers will work with these data access classes, not the database. This enables your Agile DBA to evolve the database schema as they need to, perhaps via database refactorings, and all they need to worry about is keeping the data access classes up to date.  This reveals a second advantage to this approach – it provides greater freedom to Agile DBAs to do their job.

Figure 4 depicts the concept of encapsulating access to your database, showing how the best case scenario of Figure 2 and the worst case scenario of Figure 3 would likely change. In the best-case scenario your business source code would interact with the data access objects that in turn would interact with the database. The primary advantage would be that all of the data-related code would be in one place, making it easier to modify whenever database schema changes occurred or to support performance-related changes. It’s interesting to note that the business code that your application programmers are writing would still be coupled to the data access objects.  Therefore they’d need to change their code whenever the interface of a data access object changed. You’ll never get away from coupling. However, from the point of view of the application programmer this is a much easier change to detect and act on – with the database encapsulation strategy in place the application programmers are only dealing with program source code (e.g. Java) and not program source code plus SQL code.

 

Figure 4. The scenarios revisited.

Things aren’t quite so ideal for the worst-case scenario. Although it is possible that all applications could take advantage of your encapsulation strategy the reality is that only a subset will be able to.  Platform incompatibility can be an issue – perhaps your data access objects are written in Java but some legacy applications are written using technologies that can’t easily access Java.  Perhaps you’ve chosen not to rework some of your legacy applications simply to use the database encapsulation strategy. Perhaps some applications already have an encapsulation strategy in place (if so, you might want to consider reusing the existing strategy instead of building your own). Perhaps you want to use technologies, such as a bulk load facility, that require direct access to the database schema. The point to be made is that a portion of your organization’s application will be able to take advantage of your encapsulation strategy and some won’t.  There is still a benefit to doing this, you are reducing coupling and therefore reducing your development costs and maintenance burden, the problem is that it isn’t a fully realized benefit.

Another advantage of encapsulating access to a database is that it gives you a common place, in addition to the database itself, to implement data-oriented business rules. 


5. Beyond RDBMSs: You Actually Have a Choice

Because there are some clear problems with relational database technology you may decide to use another technology.  Yes, RDBMSs are the most commonly used type of persistence mechanism but they are not the only option available to you.  These choices include:

  1. Object/relational databases. Object/relational databases (ORDBs), or more properly object/relational database management systems (ORDBMSs), add new object storage capabilities to RDBMSs. ORDBs, add new facilities to integrate management of traditional fielded data, complex objects such as time-series and geo-spatial data and diverse binary media such as audio, video, images, and (sometimes) Java applets.  ORDBMSs basically add to RDBMSs features such as defined data types, for example you could define a data type called SurfaceAddress that has all of the attributes and behaviors of an address, as well as the ability to navigate objects in addition to an RDBMS’s ability to join tables. By implementing objects within the database, an ORDBMS can execute complex analytical and data manipulation operations to search and transform multimedia and other complex objects. ORDBs support the robust transaction and data-management features of RDBMSs while at the same time offer a limited form of the flexibility of object-oriented databases. Because of ORDBMSs relational foundation, database administrators work with familiar tabular structures and data definition languages (DDLs) and programmers access them via familiar approaches such as SQL3, JDBC, and proprietary call interfaces.
  2. Object databases.  Object databases (ODBs), also known as object-oriented databases (OODBs) or object-oriented database management systems (OODBMSs), nearly seamlessly add database/persistence functionality to object programming languages. In other words, full-fledged objects are implemented in the database.  They bring much more than persistent storage of programming language objects. ODBs extend the semantics of Java to provide full-featured database programming capability, via new class libraries specific to the ODB vendor, while retaining native language compatibility. A major benefit of this approach is the unification of the application and database development into a seamless model. As a result, applications require less code, use more natural persistence modeling, and code bases are easier to maintain. Object-oriented developers can write complete database applications with a modest amount of additional effort without the need to marshal their objects into flatten data structures for storage, as a result forgoing the marshalling overhead inherent with other persistence mechanism technologies (such as RDBs). This one-to-one mapping of application objects to database objects provides higher performance management of objects and enables better management of the complex interrelationships between objects.
  3. Native XML databases. Native XML databases store information as XML documents following one of two approaches: First, a native XML database will either store a modified form of the entire XML document in the file system, perhaps in a compressed or pre-parsed binary form. Second, a native XML database may opt to map the structure of the document to the database, for example mapping the Document Object Model (DOM) to internal structures such as Elements, Attributes, and Text – exactly what is mapped depends on the database.  The most important difference between these approaches, from the point of view of an application developer, is the way they are accessed: with the first approach the only interface to the data is XML and related technologies, such as XPath (a language design specifically for addressing parts of an XML document, visit www.w3.org for details) or the DOM whereas with the second approach the database should be accessible via standard technologies such as JDBC.  The important thing to understand about native XML databases is that they work with the internal structures of the XML documents – they don’t just store them as a binary large object (BLOB) in the database.
  4. Flat files.  Flat files, such as .TXT or. CSV (comma separated value) files, are commonly used to store data. A single file can be used to store one type of data structure, such as customer information or sales transaction information, or through a coding and formatting strategy the structures of several types of data structures.  One approach to flat file organization is either to have data values separated by a pre-defined character, such as a comma, or tag, such as </FirstName> in an XML document. Another common approach is to delimit data values by size – the first 20 characters of the row represent the first name of the customer, the next 20 characters represent the surname, and so on.
  5. Hierarchical databases. Hierarchical databases link data structures together like a family tree such that each record type has only one owner, for example an order is owned by only one customer. Hierarchical structures were widely used in the first mainframe database management systems, and are still a very common source of data in many large organizations. Hierarchical databases fell out of favor with the advent of relational databases due to their lack of flexibility because it wouldn’t easily support data access outside the original design of the data structure.  For example, in the customer-order schema you could only access an order through a customer, you couldn’t easily find all the orders that included the sale of a widget because the schema isn’t designed to all that.
  6. Prevalence layer.  Klaus Wuestefeld defines prevalence as “transparent persistence, fault-tolerance and load-balancing of the execution of the business logic of an information system through the use of state snapshots as well as command and query queuing or logging”. A prevalence layer is effectively a simple persistence framework that serializes objects and writes them to log files. From the point of view of developers all objects are cached in memory and the persistence of the objects is truly treated as a background task that the developers don’t need to worry about.

Table 2 presents a comparison of the various types of persistence mechanism and provides references to vendors where applicable. Table 3 presents suggestions for when you might use each type of technology.  Large organizations will find that they are using several types of persistence mechanism and will even install the products of several different vendors. Not only do you have a choice you might be forced to work with a wide range of databases whether you want to or not.

Table 2. Comparing Types of Persistence Mechanisms.

Mechanism

Advantages

Disadvantages

Common Products

Flat Files

  • Supports simple approach to persistence
  • Good solution for smaller systems
  • Most development languages have built-in support for file streams
  • No licensing costs
  • Ad-hoc access difficult

N/A

Hierarchical Databases

  • Supports transaction-oriented applications
  • Not in common use for development of new applications

IBM’s IMS (www.ibm.com)

Object Databases

  • “Pure” approach to persisting objects
  • Existing vendors have survived the market shakeout and are likely here to stay
  • Excellent option for an application-specific database (e.g. the best-case scenario of Figure 7.2) when object-technology is used
  • Uniformity of approach towards application and data storage.
  • Facilitates refactoring because everything is an object
  • Not well accepted in the market place
  • No single dominant vendor
  • Standards defined, such as Object Query Language (OQL), are still evolving

Computer Associate’s Jasmine (www.ca.com)

Versant Developer Suite (www.versant.com)

Object Design’s ObjectStore (www.objectdesign.com)

Objectivity/DB (www.objectivity.com)

Poet (www.poet.com)

Object/Relational Databases

  • Relational vendors are slowly adopting object-relational features
  • Less of an impedance mismatch with objects
  • Not well accepted in the market place
  • No single dominant vendor
  • Emerging standards, such as SQL3, are not yet widely adopted
  • Small experience base

Cloudscape (www.cloudscape.com)

Cincom UniSQL (www.cincom.com)

Relational vendors listed above

Prevalence Layer
  • Transparent persistence of objects
  • Performance
  • Simplicity
  • Emerging technology
Prevayler (www.prevayler.org)

Relational Databases

  • Mature technology
  • Dominate the persistence mechanism market
  • Several well-established vendors
  • Standards, such as Structured Query Language (SQL) and JDBC well defined and accepted
  • Significant experience base of developers

Oracle (www.oracle.com)

Sybase (www.sybase.com)

IBM DB2 (www.ibm.com)

Microsoft SqlServer (www.microsoft.com)

Informix (www.informix.com)

XML Databases

  • Native support for persisting XML data structures (not just as a blob)
  • For XML intensive applications it removes the need for marshalling between XML structures and the database structure
  • Emerging technology
  • Standards, e.g. the XML equivalent of SQL, not yet in place for XML data access
  • Not well suited for transactional systems

dbXML (www.dbxml.org)

OpenLink Software’s Virtuoso (www.openlinksw.com)

 Software AG’s Tamino (www.softwareag.com)

X-Hive/DB (www.x-hive.com)


Table 3. Potential Applications for Data Storage Mechanisms.

Mechanism

Potential Application

Flat Files

  • Simple applications, particularly those with a “read all the information, manipulate it for awhile, and save it to disk” paradigm such as a word processor or spreadsheet where a relational database would be gross overkill
  • Persistence of configuration information
  • Sharing of information with other systems
  • Audit logging/reporting

Hierarchical databases

  • Transaction-oriented applications
  • Common source of legacy data
Object/Relational Databases
  • Complex, highly inter-related data structures (e.g. CAD/CAM parts inventory)
  • Complex and low-volume transactions (e.g. CAD/CAM, GIS applications)
  • Simple, high-volume transactions (e.g. point of sales)
  • Single application, or single application family, software products  

Object Databases

  • Complex, highly inter-related data structures (e.g. CAD/CAM parts inventory)
  • Complex and low-volume transactions (e.g. CAD/CAM, GIS applications)
  • Simple, high-volume transactions (e.g. point of sales)
  • Single application, or single application family, software products
Prevalence Layer
  • Complex object structures
  • Single application, or single application family, software products

Relational Databases

  • High-volume applications
  • Transaction-oriented applications
  • Simple-to-intermediate complexity of data
  • Data intensive applications
  • Shared, operational database
  • Reporting database

XML Databases

  • Ideally suited for XML-intensive applications such as enterprise integration portals or online reporting facilities

6. Closing Thoughts

RDBMSs technology isn’t perfect, no technology is, but it's the one in most common use so we need to learn how to work with it effectively.  The reason why I have spent so much effort discussing the drawbacks of this technology is that it is important that you understand what it is that you’re working with.  Many writers will focus on the benefits of RDBMSs, and there are clearly many benefits, but ignore the drawbacks.  Other writers will focus on academic issues such as the concept that there is no “true relational database” that fulfills all of E.F. Codd’s original twelve rules, not to mention the more finely defined features of his later writings. That’s an interesting issue to discuss over beer but I prefer to focus on the practical issues that developers face day to day when working with this technology. Refactoring Databases
Coupling is a serious issue for all IT professionals, including both application developers and Agile DBAs.  Encapsulating access to your database can help to alleviate the problems of coupling but it is only a partial solution.  It is also important to recognize that relational databases are only one of several choices that you have available to you to persist your data.  Non-relational approaches are viable solutions for some situations and should be given appropriate consideration.  Having said this my assumption throughout the rest of this book is that you will be working with relational databases to persist your data.