Agile Data

Evolutionary Software Development: How Data Activities Fit In

Follow @scottwambler on Twitter!

Should you follow the same process for a building an n-tiered web application as you would for a data warehouse? Should you follow the same process for building an online version of your customer ordering system that you successfully followed ten years ago when you built the existing system that your internal customer service representatives now use today?  The answer to both questions is no.  An n-tiered application requires a different set of primary artifacts than a data warehouse – different technologies are best modeled and built using different techniques.  The requirements for a online customer ordering system aren't clear, as you may have noticed from the wide variety of e-commerce strategies in the past few years, as compared to your internal system built years ago. The implication is that the near-serial process that you followed years ago, a process that is very likely resistant to change, isn't up to the dynamic nature of today's environment.

Table of Contents

  1. The need for methodological flexibility
  2. Beware of data-oriented BDUF
  3. Evolutionary development on an agile project
  4. The "natural order" of things and evolutionary development
  5. Summary

1. The Need for Methodological Flexibility

As an example of the need to be flexible with methodological requirements, imagine this situation: Senior management within your company has decided to adopt the ICONIX methodology (Rosenberg and Scott 1999) as the official software process that all development teams will follow from now on. The ICONIX methodology is based on the idea that you'll iteratively and incrementally identify requirements via use cases, analyze those use cases with robustness diagrams, then design your software using UML sequence diagrams and UML class diagrams. The class diagram is then used to develop your physical database schema and code. ICONIX is well suited for project teams that build business applications using object or component-based technologies.

ICONIX sounds great, doesn't it?  Perhaps to your Java developers, but what about the people working on your data-warehousing project? A data warehousing project would be better served by a data-oriented approach where you start with a conceptual data model, then work on a logical data model, then finally a physical data model (all in an iterative manner of course).  How successful do you think a data warehousing project would be if you forced them to follow ICONIX? Now let's turn it around, how successful do you think an n-tiered Java project be if you forced a data-oriented on the team? Yet surprisingly enough this is exactly what many organizations do.  They desperately want to find a “one size fits all” approach to software development, presumably for consistency and ease of management, but in doing so they put the projects at risk.  Just like you need to use the right tool for a job you need to follow the right process for a software development project.

To succeed at software development you need to be flexible in your choice of software development methodology.  There are several reasons why it is important to do so:

  • Different technologies require different techniques. 
  • Every individual is unique.   
  • Every team is unique. 
  • Your external needs vary. 
  • Project categories vary. 

2. Beware of Data-Oriented BDUF

A common approach within traditional organizations is what I like to call a data-oriented big modeling up front (BMUF) approach.  This strategy is based on two concepts: 

  • Your primary modeling artifacts are conceptual, logical, and physical data models. Data is a critical asset and therefore should be a primary driver of your development efforts.
  • You need to develop and baseline these models early in your project. The idea is that you want to think through the major issues at the beginning of the project.  The goals are to prevent any “surprises” later in the project and will enable you to proceed in parallel, the data group can focus on data-oriented activities and the development team can build the application.  Many organizations will go so far as to insist on having the physical data model in place before coding starts to provide a point of commonality, the database, between the groups. A change management process is put in place to allow changes to be made to the primary artifacts (the data models).
Unfortunately many existing data professionals believe that you need to get your data models “mostly right” reasonably early in a project. This misconception is often the result of:

Agile Database Techniques There are several serious problems to a data-oriented BMUF approach to development:

Data-oriented BMUF is a viable way to build software.  But it's certainly not agile and it certainly doesn't reflect the realities of most modern application development efforts.  It might have worked for you twenty years ago, although I doubt it was your best option back then either (I was naively working like this in the 1980s, by the way), but it isn't appropriate now.  It is time to rethink your approach to data-oriented development and adopt evolutionary techniques.


3. Evolutionary Development on an Agile Project

Evolutionary development is an iterative and incremental approach to software development. Instead of creating a comprehensive artifact, such as a requirements specification, that you review and accept before creating a comprehensive design model (and so on) you instead evolve the critical development artifacts over time in an iterative manner. Instead of building and then delivering your system in a single “big bang” release you instead deliver it incrementally over time. Yes, you will likely still need to do some initial requirements and architecture envisioning, but this is at a high level -- I can't say this enough, you don't need to do BMUF! In short, evolutionary development is new to many existing data professionals, and many traditional programmers as well.  

I have three very important observations to share with you:

  • Modern software processes take an evolutionary approach to development. 
  • Most leading processes are agile.  
  • Data is still important, but then again many other things are too.  
The implication is that if data professionals are to remain relevant that they also need to take an evolutionary approach to development.  Is this possible? Absolutely, but you have to choose to work this way.  Figure 1 depicts a high-level overview of the relationships between critical development activities. The diagram shows a collection of fully connected activities.  It is interesting to note that there is no starting point, nor is there an ending point, instead you iterate back and forth between activities as required. Furthermore, this diagram isn't complete.  For example it doesn't include activities for project management, acceptance testing, or deployment to name a few.  My focus for now is on data-oriented development activities.

 

Figure 1. Evolutionary development on an agile project.


How does the process of Figure 1 work?  Let's work through it a task at a time:

  • Modeling. There are two modeling oriented activities, object modeling and data modeling, both of which would naturally be supported by class normalization and data normalization techniques respectively. Neither object modeling nor data modeling are agile by themselves, it's how you apply these techniques that count.  Agile Modeling (AM) describes how models can be used to drive your development efforts in an agile manner, something called Agile Model Driven Development (AMDD).
  • Mapping. Because you're using object technology and relational databases (RDBs) together you need to understand how to overcome the impedance mismatch between the two. That's what mapping is all about. Because you are developing your object and data schemas in an evolutionary manner you will clearly need to evolve your mappings over time.  Similarly, difficulties in mapping may motivate changes to either your object or data schemas, perhaps even both at once.
  • Test-driven development (TDD)TDD is an approach where you write a new test, you watch it fail, then you write the little bit of functional code required to ensure that the test passes.  And yes, contrary to popular belief, you can and should test relational databases.
  • Refactoring.  A code refactoring is a small improvement to your source code that improves its design without adding new functionality.  A database refactoring is a small improvement to your database schema that doesn't change its functional or informational semantics.  Database refactoring, like code refactoring, enables you to evolve your design over time to help you to meet the new needs of your stakeholders. 
  • Performance tuning. Because modern systems use several technologies, including both object technology and RDBs, developers must be prepared to tune both these technologies and the interactions between them.
The important thing to understand is that they're quickly iterating back and forth between these tasks as required.  The models, code, tests, and mappings all evolve together. With an evolutionary approach to development your models, including data-oriented ones, are developed over time.  There is no “requirements phase” or “design phase”, instead modeling is performed as needed throughout your project in a continuous manner. 


4. The "Natural Order" of Things and Evolutionary Development

On a project using object technology and relational databases together a good strategy is to do analysis/domain/conceptual modeling before design object modeling, which in turn leads to physical data design modeling, then mapping the two models, then refactoring in conjunction with performance tuning. This is the overall order, yet you still iterate back and forth as needed. 

Let's go at it from a slightly different point of view.  Figure 2 depicts a high-level process diagram to evolutionary development that makes data-oriented activities a little more explicit. First, notice how the arrows are two-way, implying that you iterate back and forth between activities. Second, as with Figure 1 there is no starting point. Although you may choose to start with your enterprise model, then do some conceptual modeling, then let your conceptual model drive your object and data schemas this doesn't have to be the case. Depending on the nature of your project you could start with a project-level conceptual model (you may not have an enterprise model) or you may start first with traditional object modeling activities such as use case modeling. It doesn't really matter because agile software developers will iterate to another activity as required. Third, notice how I use the term “enterprise structural modeling” and not “enterprise data modeling” – many organizations are choosing to use UML class models or even UML component models (Herzum and & Sims 2000; Atkinson et. al. 2002) instead of data models for structural modeling.  Fourth, I've combined the notions of conceptual and domain modeling in one as they're often commingled anyway (if they're done at all).

Figure 2. Evolutionary Development.


5. Summary

Evolutionary approaches to software development are not only supported by leading software development processes they are in fact the norm for agile processes. You also learned that there are some significant problems with the near-serial, BDUF approaches favored by many traditional data professionals.  Most importantly you discovered that it is possible to take an evolutionary approach to data-oriented development activities, techniques that are described in greater detail in following chapters.  The bottom line is that if you want to work with an agile team you need to be prepared to work in an evolutionary manner.  It is a choice to work in this way, just as it's a choice to not do so.  Agile software developers embrace change and therefore decide to work in an evolutionary manner.