A common problem that I run into again and again is the idea that
a data model should drive the development of your objects.
This idea comes in two flavors: your physical data schema should drive
the development of your objects and that a conceptual/logical data model should
be (almost) completely developed up front before you begin to design your
objects. Both of these views are
inappropriate for non-agile projects and clearly wrong for agile projects.
Let's explore this issue in more depth.
Why do people want to base their object models on existing data
schemas? First, there is very
likely a desire to reuse the existing thinking that went behind the current
schema. I'm a firm believer in
reusing things, but I prefer to reuse the right things.
There is an impedance
mismatch between the object and relational paradigms, and this mismatch
leads object and data practitioners to different designs.
You also saw in Object
Orientation 101 that object developers apply different design techniques and
concepts than the techniques and concepts described in Data
Modeling 101 that data modelers apply.
Second, the database owner seeks to maintain or even enhance their
political standing within your organization by forcing you to base your
application on their existing design. Third,
the people asking you to take this approach may not understand the implications
of this decision, or that there are better ways to proceed.
is basing your object model on an existing data schema a bad idea?
First, your legacy
database design likely has some significant problems. In practice, I look at existing physical data models to get an idea of what is
currently going on, and to get a feel for the technical constraints that I'll
have to work with, but I won't unnaturally constrain my application with a bad
data design. Second, even if the existing database design is very good
there can be significant differences in the way that you map objects to
relational databases. Consider Figure
which depicts three object schemas, all of which can be correctly mapped
to the data schema on the right. Now
pretend you have the data schema as your starting point.
Which of the three object schemas would you generate from it?
Likely the top one, which may in fact be correct for your situation, but
then again maybe one of the other two schemas could have been better choices.
Yes, all of the models in Figure
1 could be improved, but I needed a simple example that showed how different
object schemas mapping to the same data schema.
Figure 1. Several class
structures that correctly map to the same table.
Why do people want to create (nearly) complete data models early
in the project? There are several reasons:
This is the way it's always been done, this is the way that they like,
therefore this is the way that they're going to continue to work.
Over specialization. Data modeling might be the only thing they know, or at least
it's what they prefer to specialize in.
all you have is a hammer, not only does every problem look like a nail but nails
are clearly the most important problem that needs to be addressed right now.
This reflects a serial mindset.
Many IT professionals have little or no experience taking an iterative and
incremental approach to development, let alone taking it one step further to
take an evolutionary/emergent approach.
People assume that the cost of change is high.
This is completely true when you're following a non-agile approach, but
with modern techniques such as
refactoring and Agile
Modeling the cost of change becomes much lower because these techniques
Lack of teamwork. Existing processes dictate that the data group will go off and develop the database while the
application programmers go off and build the application.
This may have worked for COBOL project teams but it doesn't work for
agile software development teams – there is one team that works together, not
several teams that work in isolation.
They don't understand the true costs. Many people are
unaware that a serial approach to development results in
wastage by the time the application is finally delivered.
||Why is basing your object model on a conceptual or logical data
model a bad idea? Actually, it's
not such a bad idea, as long as you're taking an iterative and incremental
approach, the real problem is the big design up front (BDUF) approach that many
data professionals seem to prefer. It is possible to take an evolutionary approach when
conceptual modeling, but you have to choose to work this way. Flexibility
in your approach is critical to success. However,
there are much better options. Although
the object role modeling (Halpin
2001) notation is very good, I have found that
Collaborator (CRC) cards to be a very useful technique for domain modeling with my
stakeholders. Similarly, although
logical data models can be quite useful I personally find UML class models much
more expressive due to their ability to depict behavior as well as data. Although David Hay argues in his excellent book
Analysis that you should not use UML class diagrams for domain or analysis
modeling, my experience is that you can do so quite easily if you choose not to (Hay also holds this view, although he leans towards
data models whereas I lean towards UML-based models).
However, I have to concede his point that many object modelers struggle
with analysis, but in the end that's a separate issue.
So, should you blindly base your data schema
on your object schema? No!
You need a much more robust approach.
Figure 2 shows
the three data schemas that would result from applying each of the three
inheritance mapping strategies. As
you can see mapping multiple inheritance is fairly straightforward, there aren't
any surprises in Figure 2.
The point is that it is possible for a single object schema to correctly
map to several data schemas.
Figure 2. Mapping multiple
You saw in Figure
1 that it is possible for several object schemas to map to a single data schema,
and in Figure 2 for a single
object schema to map to several data schemas.
There is a skill to successfully mapping objects to relational databases,
you can't simply create one model, press the “magic CASE tool button”, and
come up with the right answer every time.
My advice is to:
Recognize that existing legacy databases are a
technical constraint. They
aren't carved in stone, they can be
refactored over time.
You can even map to an imperfect schema and survive the experience.
Take an iterative and incremental (evolutionary) approach to
development, including modeling. Agile
developers iterate back and forth between tasks such as data modeling,
object modeling, refactoring, mapping, implementing, and performance tuning.
Your requirements should drive your object schema, your object schema should
drive your data schema and source code, and performance challenges and
platform (code & db) features should motivate evolutionary design
changes to your object schema. There is still a logical order to doing things.
On an project using object technology and relational databases
together a good strategy is to do analysis/domain/conceptual modeling before
design object modeling, which in turn leads to physical data design
modeling, then mapping the two models, then refactoring in conjunction with
performance tuning. However,
it's important to understand that this order can seem to disappear in the
heat of development. Furthermore, you will likely do some initial,
high-level requirements and
architecture envisioning although this isn't the
big modeling up front
(BMUF) that traditionalists are used to.
Modeling's principle of Multiple Models.
No one model, certainly not a data model nor a UML class diagram, is
sufficient for real world development. A
modeler that only knows how to work with one type of model is just like a
carpenter that only has a hammer in their tool kit – challenged at best, a
significant danger to your project at worst. Although class models look at a much wider picture than
data models, because they take behavior into account as well as data, they
still aren't sufficient by themselves.
Work together as a single team. It's not them and us, it's simply us. The idea that the data team will go off and create the data
model for the team, or that they must bless your data model before
development can proceed, is not agile, nor is it effective for non-agile
efforts. The best thing that I
can say about this type of approach is that it is incompetent, the worst
thing is that it is purposely done to support dysfunctional political goals
such as justifying the existence of a political faction or even to force a
project team to fail.
Be prepared to follow different approaches to
modeling. See the essay
Projects Require Different Strategies.
Choose to succeed.
Many people feel unempowered, often because they are.
If you want things within your organization to improve you're going
to have to start by improving them yourself.
Start sharing new, agile ideas with other developers.
They have a choice, but they have to decide for themselves to improve
their situation. Sometimes you
may even decide to seek employment elsewhere.
The real question isn't "what model should drive the effort" it
should be "how can we work together effectively. Time to end the
"religious battles" once and for all, a very good first step in
impedance mismatch within the IT industry.