Agile Data

Implementing a Data Warehouse via Vertical Slicing

Follow @scottwambler on Twitter!

One of the fundamental concepts of taking Disciplined Agile's (DA) approach to development is to slice functionality vertically into small, consumable pieces that may be potentially deployed into production quickly. These vertical slices are completely implemented - the analysis, design, programming, and testing are complete - and offer real business value to stakeholders. Although it is fairly clear how to do this when you're building a website, or a mobile application, it isn't clear how to do so if you're building a data warehouse (DW) or business intelligence (BI) solution.

The focus of this article is to describe what vertical slicing potentially means for a DW/BI solution. This article is organized into the following topics:

  1. What is vertical slicing in a DW/BI solution?
  2. Why vertical slicing?
  3. Vertical slicing strategies for a data warehouse
  4. You can do this too
  5. What new skills will you need?
  6. What happens when you don't slice vertically?
  7. Parting thoughts

1. What is Vertical Slicing in a DW/BI Solution?

A vertical slice is a top to bottom, fully implemented and tested piece of functionality that provides some form of business value to an end user. Most important, it should be possible to easily deploy a vertical slice into production upon request. A vertical slice can be very small, such as a single edit field on a screen, the implementation of a business rule or calculation, or the updated layout of a screen. For an agile team all of this implementation work should be accomplished during a single iteration/sprint, typically a one or two-week period. For teams following a lean delivery lifecycle this timeframe typically shrinks to days and even hours in some cases.

For a DW/BI solution a vertical slice is fully implemented from beginning (the data sources) to end (accessibility by end users in the DW or BI output). This means that you have fully implemented (within a matter of days or even hours):

  • Extraction from the data source(s). This is required only for the data elements that you need for the given vertical slice. For a mature DW you are likely to have most of the data elements already, and maybe even all of them. Worst case is when you need elements from one or more "new" data sources that you have never accessed before. This will require the initial work to gain access to the data source and to analyze the data source (ideally read its supporting documentation) to identify the data elements that you require.
  • Staging of the raw source data. I typically recommend that whenever you access a table in a data source for the first time that you stage the entire table at that point. The implication is that you may already be staging the required data elements even if you've never needed them before this point. When that's not the case you'll need to do the work to stage the required tables. Of course, if your DW architecture doesn't include staging incoming raw data first then this step should be skipped.
  • Transformation/cleansing of the source data. You need to do the work, if any, the transform the incoming source data for just the new data elements that you require for this vertical slice.
  • Loading the data into the DW. Once again, you need to do this for just the new data elements required for this vertical slice.
  • Loading your data marts (DMs). If the data elements are needed in your DMs, if you have any, to implement this vertical slice then you will need to do this work too.
  • Updating the appropriate BI views/reports where needed. As you'll soon see your slice may simply make some data available in the DW or DMs for ad-hoc reporting.

A common theme running through all of those steps is that you only do the work for the vertical slice that you're currently working on. This is what enables you to get the work done in a matter of days (and even hours once you get good at it) instead of weeks or months.

2. Why Vertical Slicing?

Vertical slicing is a fundamental agile technique for breaking up a large piece of functionality into something that is easier, safer, and quicker to work with. There are several critical benefits of vertically slicing the implementation of a DW/BI solution:

  1. Reduce the feedback cycle. By focusing on delivering small, vertical slices every few days or weeks you have more opportunities to show working functionality to stakeholders and thereby receive concrete feedback that you can act on. This enables your stakeholders to steer your work more effectively. It also motivates your team to test throughout the lifecycle, thereby reducing your overall cost of fixing any found defects dramatically.
  2. Increased ability to meet actual stakeholder needs. By taking a flexible, evolutionary approach to developing your DW/BI solution where you regularly seek feedback you end up discovering what your stakeholders actually need in practice. With a traditional approach where you attempt to think everything through up front the best you can possibly do is to build something to specification - this is unfortunately ineffective because people are not good at defining their needs up front and even if they were they would change their minds anyway due to changes in the marketplace.
  3. More competitive. Delivering in small, incremental slices enables your team to react to changing requirements quickly. The ability to deploy these vertical slices easily enables your organization to react quickly to marketplace dynamics, thereby increasing your competitiveness.
  4. Increased quality. Vertical slicing forces data professionals to adopt modern, agile database techniques that have a significantly greater focus on quality than do traditional techniques. Agile DB techniques such as database refactoring and database regression testing, are clearly focused on data quality.
  5. Lower implementation risk. Working in small vertical slices forces the team to fully integrate and test their solution very early in the lifecycle. If there are integration issues they will be found much earlier in the lifecycle when they are easier and less expensive to address.
  6. Reduced cost of delay. Delivering in vertical slices enables teams to get working functionality into the hands of their stakeholders quickly, reducing overall cost of delay (opportunity cost from a management accounting point of view).

There are several common complaints about working this way, but they rarely seem to hold water in practice. These complaints are:

  1. It takes longer to deliver the overall solution. No, the traditional/serial approach tends to take longer in practice due to less sense of urgency and the likelihood that the team will spend time building functionality that stakeholders don't actually want (because they built to the specification). By building incrementally you deliver smaller, valuable functionality into production sooner thereby reducing cost of delay.
  2. We need to think everything through at the beginning. Yes, it is a good idea to do some up front thinking, which is why disciplined agile techniques such as requirements envisioning and architectural envisioning exist.
  3. It's more expensive in the long run. This is also very rare in practice. Furthermore, the real issue is producing value, not what the expense of doing so is. Agile teams enjoy higher levels of ROI on average than traditional teams because they work in priority order and deliver incrementally (once again, reducing cost of delay).

3. Vertical Slicing Strategies for a DW/BI Solution?

There are several strategies that you can choose to employ with vertically slicing the requirements for a DW/BI solution. These strategies are described in the following table. There are example stories for each strategy as well as some advice for when to apply each strategy.

Table 1. Vertical slicing strategies for a DW/BI solution.
Slicing Strategy Example Stories When to Do This
One new data element from a single data source
  • As a Professor I would like to know the names of my students so that I know who should be there
  • As a Student I would like to know what courses are taught at the university
Very early days when you are still building out fundamental infrastructure components. Very common for the first iteration or two of Construction. These slices still add real business value, albeit minimal.
One new data element from several sources
  • As a Professor I would like the student list for a seminar that I teach
  • As a Student I would like to know what seminars are being taught this semester
Early days during Construction when you are still building out the infrastructure. These slices add some business value, often fleshing a DW data element to include the full range of data values for it.
A change to an existing report
  • As a Professor I would like to know the standard deviation of marks within a seminar that I teach
  • As a Student I would like to know how many spots are still available in a seminar
Evolution of existing functionality to support new decision making
A new report
  • As a Professor I would like to know the distribution curve of student marks in a seminar that I teach so I may adjust accordingly
  • As a Registrar I would like to know what Seminars are close to being full
Several iterations into Construction when the DW/BI solution has been built up sufficiently.
A new reporting view
  • As a Registrar I would like to know what the prerequisites are for a seminar so that I can advise students
  • As a Professor I would like to know the current course load of each student within a seminar that I teach
Several iterations into Construction when the DW/BI solution has been built up sufficiently.
A new DW/DM table
  • As a Chancellor I would like to track the revenues generated from parking pay meters to identify potential profits to divert to supporting students
  • As a Professor I would like to recommend suggested readings to help people prepare before taking a seminar
Several iterations into Construction when the DW/BI solution has been built up sufficiently.

There are several interesting things about the stories in the table:

  1. They are written from the point of view of your stakeholders. They aren't a technical specification. For example, the first story describes how professors want a list of student names but it isn't saying from what data source(s), what the element names are, … These are design issues, not requirement issues.
  2. They always provide business value. The first story appears to be the beginnings of an attendee list for a seminar. Having something as simple as a list of names does in fact provide a bit of value to professors.
  3. Sometimes that business value isn't (yet) sufficient. It may take several iterations to implement something that your stakeholders want delivered into production, particularly at first. For example, although a list of student names is the beginnings of a class list it might not be enough functionality to justify putting it into production. Perhaps professors also need to know the program that the student is enrolled in, their current year of study, and basic information about the seminar such as the course name, time, and location of it. The decision as to whether the functionality is sufficient to ship is in the hands of your stakeholder (this is one of the reasons why you want to demo your work on a regular basis).

4. You Can Do This Too

This can be hard to hear sometimes, but you're not special. Others are in fact doing this, often for years, and have been doing so successfully. Yes, just like you, they had to deal with:
  • Solving hard problems
  • Legacy data sources that were rarely perfect
  • Legacy data sources that were not under their control, sometimes owned by people difficult to work with, and sometimes not even owned by their organization
  • Stakeholders who change their minds, or ask for fixed budgets, or ask for exact delivery dates, and of course ask for any combinations thereof
  • Teams made up of experienced data professionals whose culture tells them that they need to do detailed modeling up front

5. What New Skills Will You Need?

Vertical slicing is an important agile skill in general. Vertical slicing of a DW/BI solution is supported by other agile data skills, all of which have been proven in practice. These skills include:
  • Agile data modeling - For lightweight initial modeling and evolutionary detailed modeling of your data structures.
  • Agile modeling in general - There's more to modeling than data.
  • Database refactoring - To safely and easily evolve existing databases, including your data warehouse and data marts.
  • Database regression testing - To validate your work in an automated manner
  • Continuous database integration - To ensure changes are automatically regression tested.
  • Continuous database deployment - To ensure working updates to your database are shared appropriately.

6. What Happens When You Don't Slice Vertically?

Teams that don't know how to slice their work vertically often fall into one of the Mini-Waterfall or Staggered Mini-Waterfall process anti-patterns. Neither of these strategies are agile - let's explore each of them and see why.

Figure 1 below depicts a mini-waterfall approach where a team works through the traditional phases, mostly in order, throughout an iteration/sprint. These iterations are typically longer than usual, often four or more weeks in length, whereas 80% of agile teams have iterations of two weeks or less. Mini-waterfalls are common with teams that are very new to agile and in this case should be seen as a step in the right direction away from the traditional/serial approach towards an agile approach. However, if you're taking a mini-waterfall approach because of one or more of the reasons discussed earlier (see you can do this too) then what's really happened is that the team is using one of those flimsy excuses for not making the behavioral changes required to be truly agile.

Figure 1. A mini-waterfall.

The Staggered Mini-Waterfall anti-pattern is depicted in Figure 2 below. The basic strategy is that the team is organized into functional silos such as data analysts, data architects/designers, developers, and testers - usually along the lines of what people were comfortable with taking a traditional approach. The analysts do their "sprint" where they complete the data analysis work for one or more stories. They then hand this off to the designers who do their "design sprint", who hand off to the developers to do their "development sprint", and finally to the testers who do their "testing sprint." Once the analysts hand off their work to the designers they move on to analyze the next batch of requirements (often user stories). Once again, at best this might be a step towards becoming agile but it certainly isn't agile. Many times when I run into a DW/BI team taking this approach it's because the team is composed of people who are overly specialized (remember, agilists strive to become cross-functional generalizing specialists) and often have not bothered to learn modern agile database skills. This is ok if you're just starting out with agile, as we like to say you go to war with the army that you've got so if everyone is a specialist then that's how you start out. BUT, when you invest in your people and when team members recognize the importance of learning new skills then they can quickly work together to learn new skills from one another.

Figure 2. Staggered mini-waterfalls.

As we show in the article Disciplined Agile Data Warehousing it is in fact possible for DW/BI teams to work in an agile manner. There is absolutely no reason, except as a step in your team's overall learning effort, to follow either a Mini-Waterfall or Staggered Mini-Waterfall approach. You can and should do better.

7. Parting Thoughts

Vertical slicing is an important skill for any agile team, regardless of what they are building. In this article you learned that it is highly desirable to do so for a DW/BI solution and more importantly that the techniques exist to do so. For most people the hardest thing about vertical slicing is to adopt the agile mindset behind working this way, something that can be very tough for experienced data professionals given the cultural impedance mismatch between traditional data professionals and modern agile practitioners.

At Scott Ambler + Associates we help teams to become more effective in the way that they work. We coach, educate and train people in advanced agile and lean skills. We have a wide variety of workshops that we deliver, including one on Disciplined Agile DB/BI skills. We would love to help you on your agile journey.

8. Recommended Resources