Agile Data

Why Data Models Shouldn't Drive Object Models (And Vice Versa)

www.agiledata.org: Techniques for Successful Evolutionary/Agile Database Development

Scott W. Ambler
The Object Primer 3rd Edition: Agile Model Driven Development (AMDD) with UML 2 A common problem that I run into again and again is the idea that a data model should drive the development of your objects.  This idea comes in two flavors: your physical data schema should drive the development of your objects and that a conceptual/logical data model should be (almost) completely developed up front before you begin to design your objects.  Both of these views are inappropriate for non-agile projects and clearly wrong for agile projects.  Let’s explore this issue in more depth.  

 

Why do people want to base their object models on existing data schemas?  First, there is very likely a desire to reuse the existing thinking that went behind the current schema.  I’m a firm believer in reusing things, but I prefer to reuse the right things.  There is an impedance mismatch between the object and relational paradigms, and this mismatch leads object and data practitioners to different designs.  You also saw in Object Orientation 101 that object developers apply different design techniques and concepts than the techniques and concepts described in Data Modeling 101 that data modelers apply.  Second, the database owner seeks to maintain or even enhance their political standing within your organization by forcing you to base your application on their existing design.  Third, the people asking you to take this approach may not understand the implications of this decision, or that there are better ways to proceed.

Why is basing your object model on an existing data schema a bad idea?  First, your legacy database design likely has some significant problems.  In practice, I look at existing physical data models to get an idea of what is currently going on, and to get a feel for the technical constraints that I’ll have to work with, but I won’t unnaturally constrain my application with a bad data design.  Second, even if the existing database design is very good there can be significant differences in the way that you map objects to relational databases.  Consider Figure 1 which depicts three object schemas, all of which can be correctly mapped to the data schema on the right.  Now pretend you have the data schema as your starting point.  Which of the three object schemas would you generate from it?  Likely the top one, which may in fact be correct for your situation, but then again maybe one of the other two schemas could have been better choices.  Yes, all of the models in Figure 1 could be improved, but I needed a simple example that showed how different object schemas mapping to the same data schema.

 

Figure 1. Several class structures that correctly map to the same table.

Why do people want to create (nearly) complete data models early in the project?  There are several reasons:

  1. Existing culture.  This is the way it’s always been done, this is the way that they like, therefore this is the way that they’re going to continue to work. 

  2. Over specialization.  Data modeling might be the only thing they know, or at least it’s what they prefer to specialize in.  When all you have is a hammer, not only does every problem look like a nail but nails are clearly the most important problem that needs to be addressed right now. 

  3. This reflects a serial mindset.  Many IT professionals have little or no experience taking an iterative and incremental approach to development, let alone taking it one step further to take an evolutionary/emergent approach. 

  4. People assume that the cost of change is high.  This is completely true when you’re following a non-agile approach, but with modern techniques such as database refactoring and Agile Modeling the cost of change becomes much lower because these techniques support change. 

  5. Lack of teamwork.  Existing processes dictate that the data group will go off and develop the database while the application programmers go off and build the application.  This may have worked for COBOL project teams but it doesn’t work for agile software development teams – there is one team that works together, not several teams that work in isolation.

  6. They don't understand the true costs.  Many people are unaware that a serial approach to development results in significant wastage by the time the application is finally delivered.

Why is basing your object model on a conceptual or logical data model a bad idea?  Actually, it’s not such a bad idea, as long as you’re taking an iterative and incremental approach, the real problem is the big design up front (BDUF) approach that many data professionals seem to prefer.  It is possible to take an evolutionary approach when conceptual modeling, but you have to choose to work this way.  Flexibility in your approach is critical to success.  However, there are much better options.  Although the object role modeling (Halpin 2001) notation is very good, I have found that Class Responsibility Collaborator (CRC) cards to be a very useful technique for domain modeling with my project stakeholders.  Similarly, although logical data models can be quite useful I personally find UML class models much more expressive due to their ability to depict behavior as well as data.  Although David Hay argues in his excellent book Requirements Analysis that you should not use UML class diagrams for domain or analysis modeling, my experience is that you can do so quite easily if you choose not to (Hay also holds this view, although he leans towards data models whereas I lean towards UML-based models).  However, I have to concede his point that many object modelers struggle with analysis, but in the end that’s a separate issue.

So, should you blindly base your data schema on your object schema?  No!  You need a much more robust approach.  Figure 2 shows the three data schemas that would result from applying each of the three inheritance mapping strategies.  As you can see mapping multiple inheritance is fairly straightforward, there aren’t any surprises in Figure 2.  The point is that it is possible for a single object schema to correctly map to several data schemas.

Figure 2. Mapping multiple inheritance.

 

You saw in Figure 1 that it is possible for several object schemas to map to a single data schema, and in Figure 2 for a single object schema to map to several data schemas.  There is a skill to successfully mapping objects to relational databases, you can’t simply create one model, press the “magic CASE tool button”, and come up with the right answer every time.

My advice is to:

The real question isn't "what model should drive the effort" it should be "how can we work together effectively.  Time to end the "religious battles" once and for all, a very good first step in overcoming the cultural impedance mismatch within the IT industry.

 

References and Suggested Online Readings

Agile Database Techniques This book describes the philosophies and skills required for developers and database administrators to work together effectively on project teams following evolutionary software processes such as Extreme Programming (XP), the Rational Unified Process (RUP), the Agile Unified Process (AUP), Feature Driven Development (FDD), Dynamic System Development Method (DSDM), or The Enterprise Unified Process (EUP).  In March 2004 it won a Jolt Productivity award.
Refactoring Databases

This book describes, in detail, how to refactor a database schema to improve its design. The first section of the book overviews the fundamentals evolutionary database techniques in general and of database refactoring in detail.  More importantly it presents strategies for implementing and deploying database refactorings, in the context of both "simple" single application databases and in "complex" multi-application databases.  The second section, the majority of the book, is a database refactoring reference catalog.  It describes over 60 database refactorings, presenting data models overviewing each refactoring and the code to implement it.

 

The Object Primer 3rd Edition: Agile Model Driven Development (AMDD) with UML 2 This book presents a full-lifecycle, agile model driven development (AMDD) approach to software development.  It is one of the few books which covers both object-oriented and data-oriented development in a comprehensive and coherent manner.  Techniques the book covers include Agile Modeling (AM), Full Lifecycle Object-Oriented Testing (FLOOT), over 30 modeling techniques, agile database techniques, refactoring, and test driven development (TDD).  If you want to gain the skills required to build mission-critical applications in an agile manner, this is the book for you.
 

 

 

Let Me Help

I actively work with clients around the world to improve their information technology (IT) practices as both a mentor/coach and trainer.  A full description of what I do, and how to contact me, can be found here

 


Copyright © 2003-2006  Scott W. Ambler

Last updated: April 7, 2006
This site owned by
Ambysoft Inc.

|
About This SiteMailing List | Site Map | Contact Me | Suggested Books |