Agile Data

Why Data Models Shouldn't Drive Object Models (And Vice Versa)

Follow @scottwambler on Twitter!

A common problem that I run into again and again is the idea that a data model should drive the development of your objects.  This idea comes in two flavors: your physical data schema should drive the development of your objects and that a conceptual/logical data model should be (almost) completely developed up front before you begin to design your objects. Both of these views are inappropriate for non-agile projects and clearly wrong for agile projects.  Let's explore this issue in more depth. Why do people want to base their object models on existing data schemas? First, there is very likely a desire to reuse the existing thinking that went behind the current schema. I'm a firm believer in reusing things, but I prefer to reuse the right things.  There is an impedance mismatch between the object and relational paradigms, and this mismatch leads object and data practitioners to different designs.  You also saw in Object Orientation 101 that object developers apply different design techniques and concepts than the techniques and concepts described in Data Modeling 101 that data modelers apply.  Second, the database owner seeks to maintain or even enhance their political standing within your organization by forcing you to base your application on their existing design. Third, the people asking you to take this approach may not understand the implications of this decision, or that there are better ways to proceed.

Why is basing your object model on an existing data schema a bad idea?  First, your legacy database design likely has some significant problems.  In practice, I look at existing physical data models to get an idea of what is currently going on, and to get a feel for the technical constraints that I'll have to work with, but I won't unnaturally constrain my application with a bad data design. Second, even if the existing database design is very good there can be significant differences in the way that you map objects to relational databases. Consider Figure 1 which depicts three object schemas, all of which can be correctly mapped to the data schema on the right. Now pretend you have the data schema as your starting point.  Which of the three object schemas would you generate from it?  Likely the top one, which may in fact be correct for your situation, but then again maybe one of the other two schemas could have been better choices.  Yes, all of the models in Figure 1 could be improved, but I needed a simple example that showed how different object schemas mapping to the same data schema.

Figure 1. Several class structures that correctly map to the same table.

Why do people want to create (nearly) complete data models early in the project? There are several reasons:

  1. Existing culture.  This is the way it's always been done, this is the way that they like, therefore this is the way that they're going to continue to work. 

  2. Over specialization. Data modeling might be the only thing they know, or at least it's what they prefer to specialize in.  When all you have is a hammer, not only does every problem look like a nail but nails are clearly the most important problem that needs to be addressed right now. 

  3. This reflects a serial mindset.  Many IT professionals have little or no experience taking an iterative and incremental approach to development, let alone taking it one step further to take an evolutionary/emergent approach. 

  4. People assume that the cost of change is high.  This is completely true when you're following a non-agile approach, but with modern techniques such as database refactoring and Agile Modeling the cost of change becomes much lower because these techniques support change. 

  5. Lack of teamwork. Existing processes dictate that the data group will go off and develop the database while the application programmers go off and build the application.  This may have worked for COBOL project teams but it doesn't work for agile software development teams – there is one team that works together, not several teams that work in isolation.

  6. They don't understand the true costs. Many people are unaware that a serial approach to development results in significant wastage by the time the application is finally delivered.

The Object Primer 3rd Edition: Agile Model Driven Development (AMDD) with UML 2 Why is basing your object model on a conceptual or logical data model a bad idea? Actually, it's not such a bad idea, as long as you're taking an iterative and incremental approach, the real problem is the big design up front (BDUF) approach that many data professionals seem to prefer. It is possible to take an evolutionary approach when conceptual modeling, but you have to choose to work this way. Flexibility in your approach is critical to success. However, there are much better options. Although the object role modeling (Halpin 2001) notation is very good, I have found that Class Responsibility Collaborator (CRC) cards to be a very useful technique for domain modeling with my project stakeholders. Similarly, although logical data models can be quite useful I personally find UML class models much more expressive due to their ability to depict behavior as well as data. Although David Hay argues in his excellent book Requirements Analysis that you should not use UML class diagrams for domain or analysis modeling, my experience is that you can do so quite easily if you choose not to (Hay also holds this view, although he leans towards data models whereas I lean towards UML-based models).  However, I have to concede his point that many object modelers struggle with analysis, but in the end that's a separate issue.

So, should you blindly base your data schema on your object schema? No!  You need a much more robust approach.  Figure 2 shows the three data schemas that would result from applying each of the three inheritance mapping strategies. As you can see mapping multiple inheritance is fairly straightforward, there aren't any surprises in Figure 2.  The point is that it is possible for a single object schema to correctly map to several data schemas.

Figure 2. Mapping multiple inheritance.

 

You saw in Figure 1 that it is possible for several object schemas to map to a single data schema, and in Figure 2 for a single object schema to map to several data schemas.  There is a skill to successfully mapping objects to relational databases, you can't simply create one model, press the “magic CASE tool button”, and come up with the right answer every time.

My advice is to:

  • Recognize that existing legacy databases are a technical constraint. They aren't carved in stone, they can be refactored over time.  You can even map to an imperfect schema and survive the experience.
  • Take an iterative and incremental (evolutionary) approach to development, including modeling. Agile developers iterate back and forth between tasks such as data modeling, object modeling, refactoring, mapping, implementing, and performance tuning. Your requirements should drive your object schema, your object schema should drive your data schema and source code, and performance challenges and platform (code & db) features should motivate evolutionary design changes to your object schema. There is still a logical order to doing things.  On an project using object technology and relational databases together a good strategy is to do analysis/domain/conceptual modeling before design object modeling, which in turn leads to physical data design modeling, then mapping the two models, then refactoring in conjunction with performance tuning. However, it's important to understand that this order can seem to disappear in the heat of development. Furthermore, you will likely do some initial, high-level requirements and architecture envisioning although this isn't the big modeling up front (BMUF) that traditionalists are used to.
  • Adopt Agile Modeling's principle of Multiple Models.  No one model, certainly not a data model nor a UML class diagram, is sufficient for real world development. A modeler that only knows how to work with one type of model is just like a carpenter that only has a hammer in their tool kit – challenged at best, a significant danger to your project at worst. Although class models look at a much wider picture than data models, because they take behavior into account as well as data, they still aren't sufficient by themselves.
  • Work together as a single team. It's not them and us, it's simply us. The idea that the data team will go off and create the data model for the team, or that they must bless your data model before development can proceed, is not agile, nor is it effective for non-agile efforts. The best thing that I can say about this type of approach is that it is incompetent, the worst thing is that it is purposely done to support dysfunctional political goals such as justifying the existence of a political faction or even to force a project team to fail.
  • Be prepared to follow different approaches to modeling. See the essay Different Projects Require Different Strategies.
  • Choose to succeed.  Many people feel unempowered, often because they are.  If you want things within your organization to improve you're going to have to start by improving them yourself.  Start sharing new, agile ideas with other developers.  They have a choice, but they have to decide for themselves to improve their situation. Sometimes you may even decide to seek employment elsewhere.

The real question isn't "what model should drive the effort" it should be "how can we work together effectively. Time to end the "religious battles" once and for all, a very good first step in overcoming the cultural impedance mismatch within the IT industry.