|
As you can see at this site I've written a fair bit about
leading-edge practices surrounding the development
and evolution of
relational database management systems (RDBMSs). Granted, I've also
strayed to other technologies such as
XML
and
enterprise issues, but for the most part the focus
has been on RDBMSs. Be that as it may, one thing that I haven't done
is written about relational theory at all. The reason for this is
two-fold: first, my focus is on practical matters and second, relational
theory really doesn't seem to have much to offer to database
practitioners. In the 1980s I earned a degree in
Computer Science at the
University of Toronto. I was lucky enough to
realize that data-related topics were important and chose to take as
many classes as I could in the subject (they were all electives if
memory serves). The most memorable was a fourth year course on
RDBMSs where the focus was relational theory. The course was
memorable not because of the content, which was solid, but because none
of it proved relevant on the job. In addition to a few problem
sets where the focus was on writing proofs using predicate calculus and
set theory, the main assignment was the development of a database engine
which could process simple CRUD queries. Sadly, at the end of the
course, I had the skill to build an RDBMS engine to process SQL, but
didn't know a thing about applying SQL in practice.
Over the years I worked on a variety of systems, almost all of which had
RDBMSs
on the back end. I did this at banks, insurance companies, government
agencies, telecommunications firms, and retail firms. I've worked with a
variety of technologies and with a range of people with different experiences.
In all cases, although relational theory was sometimes mentioned in conversation
(more on this in a minute), it never proved directly relevant in practice.
Indirectly I very likely did benefit from learning about selection, projection,
union, relations, relation variables (relvars), tuples, and all those other good things. These
days, I'm occasionally asked what I think about relational theory and where it
fits into practice. My answer is that it's
important in a few niche situations,
and it does seem to provide a foundation upon which you can build practical
skills, but invariably it seems to me that the person asking about it really isn't
interested in practice at all and is more likely looking for an excuse not
to improve their skillset.
From what I've seen over the years, relational theory is important when:
- You're developing a relational database engine. I haven't
built a commercial database engine myself, but I'm willing to go out on a
limb and guess that the people working on such things at IBM, Oracle, and
Microsoft (to name a few), are interested in relational theory.
However, considering how much as been written about the fact that
RDBMS
vendors are not remaining true to the relational model, and
regularly go beyond it, it tells me that although they may be interested
in relational theory they're only acting on it when it makes practical
sense. More importantly, when you see the features which have been
added to mainstream RDBMSs, such as the
Java VM
to Oracle and the CLR to Microsoft SQLServer, it's clear that the vendors
are moving away from basing their products on relational theory. As
Dawn Wolthuis has said, this
may seem controversial to the theorists, but more and more it's the state of
the industry.
- You're a computer science academic. Academics like to focus on
theory, and many have literally made a career out of it, and have even
managed to sell a few books on the subject. Good for them, but that's
not practice.
- That's all you know, or at least that's what you prefer to focus on.
Many people within IT are overly specialized, a reflection of the
Tayloristic theories inflicted on IT in the 1960s and 1970s, and as a result
they have an unjustified belief in the importance of their specialty.
This is a completely natural thing to happen, albeit a dysfunctional one.
Luckily there is a clear and growing trend in the IT industry away from
specialists towards
generalizing specialists, so the already niche specialty of relational
theorists will surely shrink over time.
- You're focused on past glories. Relational theory has had
important impacts on the IT industry, in particular the
SQL language and RDBMSs are at least partially based upon it, but that was way back in the 1970s.
It's also had an impact on
data modeling
practices, including introducing the concept of
data
normalization and functional dependencies, which is clearly valuable.
But, what has come of relational theory lately? Furthermore, with data
being only one of many aspects (e.g. functionality, security, hardware,
network, user interface, ...) facing IT professionals today, relational
theory at best seems applicable only to a very narrow sliver of software
development and therefore doesn't appear to supply much of a basis for new
advances (and sure enough, it hasn't).
- You can't find anything better to talk about over a couple of beers.
How sad.
Every so often someone asks me about how the techniques such as
database
refactoring,
agile data
modeling, or
database regression testing relate to relational theory. My answer is
always the same: my focus is on practice, not theory. As I indicate above, relational theory has provided the
foundation for some important practices, but I really can't recall the time
when I saw a database practitioner stop to work out the relational algebra
behind whatever it was they were working on at the time. They just did the
work.
The theorists like to claim that the reason why there are
so many problems with existing database designs is that practitioners don't
understand relational theory, and in some ways they have a valid point.
A lot of good database developers that I know understand the theory, whether or
not they've received any formal training in it, but they
also understand far more than just that. Unfortunately, the theorists struggle to make their ideas attractive to
practitioners, writing books which are either inaccessible to them or simply
too-far divorced from the realities of software development, and in the end
exacerbate the problem they are trying to address. Worse yet, the
theorists seem to focus on modeling databases from scratch, they rarely seem to
have advice for those of us who are dealing with
existing legacy
data sources and the mission-critical systems using them. It typically
isn't an option to start from scratch and rebuild them, so where are the
techniques to help us address the problems which we actually face? They
don't seem to be coming from the theory guys (NOTE: If you know of any writings
from the theory folks among us which do address legacy concerns, I'd appreciate
it if you could send me
an email). Techniques such as
database
refactoring and
database regression testing aren't coming from the theory folks, they're
coming from those of us in the trenches who are trying to find ways to get the
job done.
Why Even Mention Relational Theory in Conversation?
So why do people ask about relational theory? From what I've seen,
there are several reasons:
- That's all they know, or at least that's what they focus on.
They're one-trick ponies, and they're desperate to convince you that the one
trick that they know is impressive.
- They're looking for an excuse not to change. This is
probably the most common problem, the person fears the many changes which
they're seeing in the IT community and they're desperately trying to avoid
such changes. They're often looking to justify their unwillingness to
change by claiming that the promoter of a new idea doesn't understand
relational theory (regardless of whether relational theory is even
applicable or whether the person actually does understand relational theory) or that if there isn't a mathematical proof supporting the
concept that it couldn't be any good. The book
Fearless Change is a great resource, as is
Becoming Agile.
- They're a zealot. Unfortunately, we have them among us and
there's rarely anything that we can do to help them. They have their
way of doing things and they're really not interested in hearing about
anything else. Concepts such as applying the
right
software process/method for your situation, or
applying the right model for your situation, are the antithesis of their
"one size fits all" theories. Worse yet, they're often in
complete denial that their approaches don't seem to be widely adopted in
practice but seem to think that it's only a matter of time until this
happens. If practitioners didn't bother to learn relational theory at
its height of popularity, it's doubtful that many will bother now.
- They think that it drives development efforts. Some people
have a nasty habit of making sweeping statements about the importance and
applicability of relational and/or mathematical theory, statements which
often make sense only to people who are narrowly focused on data-oriented
activities. How many times have you heard claims that a
solid
grounding in relational theory will result in great database designs, or
will ensure data integrity? Shouldn't we worry about great overall
designs which look at the
entire
picture, not just data? Wouldn't a good
testing
strategy do more to help ensure quality, particularly when the
traditional approach certainly seems to have resulted in some questionable
database designs over the years? When you start to look at the bigger
picture and you accept the fact that there is far more to development than
just data, then you quickly realize that relational theory is not as
important as the theorists would like you to believe. Or at best, it's
one of many aspects of theory that you should learn. Call me a
radical, but shouldn't we adopt techniques which work in practice and which
address the actual problems that we face, and worry
a little bit less about mathematical theory?
- They've been misguided by the one-trick ponies, the fearful, and the
zealots among us. These people we can actually help, which is one
of the reasons why I wrote this article. Many people will often listen
to someone, and when they hear what they expect to hear from them, or more
importantly what they want to hear from them, then they pretty much leave it
at that. They may not know that there are other sides to the issue, or
that perhaps these other sides have been misrepresented to them (if
mentioned at all) by these other people.
|
Why is relational theory an issue for someone who is clearly a practitioner?
I've become concerned because of the damage within the IT industry that I'm seeing caused in its
name. As I noted, some people use relational theory as an
excuse not to change, but frankly that's their business and I'm happy to let
them travel along their own path. But other people, in particular college
instructors and book writers, needlessly inflict relational theory on people who
are trying to learn how to become an
effective data professional, or better yet
an effective IT professional. Too much focus on theory can really make
data-oriented development techniques unattractive to practitioners, which is one
of the reasons why I think so many
application
developers seem to have little or no skills in this area. |
Going back full circle, what should I have learned in my university database
course. If I were organizing such a course today, the agenda would look
something like this:
- The history of databases and data theory. There's always
value in spending a few hours on foundational concepts, including relational
theory as well as
newer data-oriented theories.
- An overview of data storage technologies. Students should
know the differences between the various data storage options available to
them. This would include the various database management system
approaches (relational, network,
hierarchical, XML, and object (this isn't a complete list)) as well as file
management strategies. Furthermore there should be a discussion of
the trade-offs between the approaches and advice for when to use each.
An important message should be that RDBMSs and files are the
most common storage mechanisms in use today, and that XML is an important
data transport representation.
- Where data fits into the overall software development process.
This is a message which is sorely missing in many university curriculums and
books (surprisingly, including the vast majority of data books). The
first
philosophy of the Agile Data method says it well: Data is one of many
important aspects of IT. As you can see at
Agile Models Distilled
and
Software Development Phases Examined, data-oriented techniques represent
a small portion of the knowledge which IT professionals require to be
successful. Important knowledge to be sure, but only a small sliver of
the overall picture.
- Data modeling.
Data modeling
is one of many important skills a developer should have. Furthermore,
they should have an understanding of both traditional approaches to data
modeling as well as
agile/evolutionary approaches to be effective.
- Database development techniques. Students should learn
when, and how, to
implement functionality in relational databases. They should
understand what triggers, stored procedures/functions, and database objects
are and how to develop them. Furthermore, they should understand
relevant application development issues such as
how to
retrieve objects from an RDB,
security
access control,
transaction control, and
concurrency control.
- Database testing. Students should learn how to
test
relational databases. Data is an important corporate asset, and measures
should be taken to ensure its quality. Similarly, mission-critical
functionality is often implemented in databases which should also be tested.
- Database refactoring. Just like you should
refactor your code to ensure that it's of the highest quality design at
all times, you should do the same for your database schema. Modern
developers work in an
evolutionary, if not
agile manner, and so must people doing database work.
Database
refactoring enables them to do exactly that.
- Working with legacy data sources. An understanding of the
challenges presented by
legacy data
sources, and how to overcome them, is critical knowledge. Legacy
data sources are a fact of life: you might be able to
refactor
them over time, but the reality is that you'll need to learn to live with
them and to deal with the data quality, design, and architectural challenges
which they suffer from.
- Object/relational development techniques. A common
strategy in organizations today is build applications using a combination
of
object and
relational technologies. Students should understand the
technical
impedance mismatch between the two technologies and understand the
fundamentals of
O/R mapping.
- Reporting strategies. The course should include a discussion of
the various strategies for
implementing
reports, including discussion of data marts and
data warehouses.
- Data management within the enterprise. Although it will
likely be difficult for students to grasp due to lack of real world
experience, they should be given an appreciation for the need to
take enterprise issues into account when developing systems. This
includes having an appreciation for the importance of
enterprise architecture and
administration. Students should also learn about the
cultural impedance mismatch that they are likely to face in some
organizations.
In short, relational theory does have its place in modern database practice,
it's just that this place is several orders of magnitude less than what the
theorists among us would have us think. But they're welcome to grind that
axe if it makes them happy, they just shouldn't be surprised that the rest of us
aren't paying much attention to them. I also invite the theorists to get
their hands dirty and gain some practical experience on a modern software
development project (e.g. a
RUP or
an agile (XP,
AUP,
FDD, ...) project) and
see what actually happens in the real world.
| |
| Remember the adage:
In theory, practice and theory are one and the
same.
In practice, they're not. |
|
Acknowledgements
I'd like to thank Curt Monash,
Curt Sampson, and
Dawn Wolthuis for their
feedback regarding this article.
|
|