Agile Data

Introduction to Class Normalization

Follow @scottwambler on Twitter!

In the data world there is a common process called data normalization by which you organize data in such a way as to reduce and even eliminate data redundancy, effectively increasing the cohesiveness of data entities. Can the techniques of data normalization be applied to object schemas?  Yes, but this isn’t an ideal approach because data normalization only deals data and not behavior. We need to consider both when normalizing our object schema.  We need to rethink our approach. Class normalization is a process by which you reorganize the structure of your object schema in such a way as to increase the cohesion of classes while minimizing the coupling between them.

Unfortunately class normalization hasn’t been adopted as widely as I would have hoped. This happened for a couple of reasons, but a big part of the problem was that class normalization was clearly overshadowed by design patterns at the time.  Although design patterns, which describe solutions to known problems within a defined context, are very good things they are a different and complementary approach. An important benefit of class normalization over design patterns is that the concept is familiar to data professionals and thus provides a bridge for them to help learn object techniques (at least that’s been my experience). In this article, I discuss:

  1. First Object Normal Form (1ONF)
  2. Second Object Normal Form (2ONF)
  3. Third Object Normal Form (3ONF)
  4. Class Normalization and Other Object Design Techniques

1. First Object Normal Form (1ONF)

A class is in first object normal form (1ONF) when specific behavior required by an attribute that is actually a collection of similar attributes is encapsulated within its own class.  An object schema is in 1ONF when all of its classes are in 1ONF. 

Consider the class Student in Figure 1. You can see that it implements the behavior for adding and dropping students to and from seminars. The attribute seminars is a collection of seminar information, perhaps implemented as an array of arrays, that is used to track what seminars a student is assigned to.  The operation addSeminar() enrolls the student into another seminar whereas dropSeminar() removes them from one.  The operation printSchedule() produces a list of all the seminars the student is enrolled in so that the student can have a printed schedule. The operations setProfessor() and setCourseName() make the appropriate changes to data within the seminars collection. This design is clearly not very cohesive – this single class is implementing functionality that is appropriate to several concepts.


Figure 1. The Student class in 0ONF.

Figure 2 depicts the object schema in 1ONF.  Seminar was introduced, having both the data and the functionality required to keep track of when and where a seminar is taught, as well as who teaches it and what course it is.  It also implements the functionality needed to add students to the seminar and drop students from the seminar.  By encapsulating this behavior in Seminar we have increased the cohesion of our design – Student now does student kinds of things and Seminar does seminar types of things. In the schema of Figure 1 Student did both. 

Figure 2. The Student class in 1ONF.

It should be clear that 1ONF is simply the object equivalent of data’s first normal form (1NF) – with 1NF you remove repeating groups of data from a data entity and with 1ONF you remove repeating groups of behavior from a class.


2. Second Object Normal Form (2ONF)

A class is in second object normal form (2ONF) when it is in 1ONF and when “shared” behavior that is needed by more than one instance of the class is encapsulated within its own class(es).  An object schema is in 2ONF when all of its classes are in 2ONF. 

Consider Seminar in Figure 2. It implements the behavior of maintaining both information about the course that is being taught in the seminar and about the professor teaching that course.  Although this approach would work, it unfortunately doesn’t work very well. When the name of a course changes you’d have to change the course name for every seminar of that course.  That’s a lot of work.  Figure 3 depicts the object schema in 2ONF. To improve the design of Seminar we have introduced two new classes, Course and Professor which encapsulate the appropriate behavior needed to implement course objects and professor objects.  As before, notice how it has been easy to introduce new functionality to our application. Course now has methods to list the seminars that it is being taught in (needed for scheduling purposes) and to create new seminars because popular courses often need to have additional seminars added at the last moment to meet student demand.  The Professor class now has the ability to produce a teaching schedule so that the real-world person has the information needed to manage his or her time.


Figure 3. The object schema in 2ONF.



3. Third Object Normal Form (3ONF)

Although putting the object schema in 2ONF is definitely a step in the right direction we can still improve the design further. A class is in third object normal form (3ONF) when it is in 2ONF and when it encapsulates only one set of cohesive behaviors. An object schema is in 3ONF when all of its classes are in 3ONF. 

In Figure 3 the Student class encapsulates the behavior for both students and addresses.  The first step would be to refactor Student into two classes, Student and Address. This would make our design more cohesive and more flexible because there is a very good chance that students aren’t the only things that have addresses.  However, this isn’t enough because the Address class still needs to be normalized. There is behavior that is associated only with zip codes, formatting and validation to be specific. For example, based on the zip code it should be possible to determine whether or not the city and state of an address are valid. This realization leads to the class diagram presented in Figure 4 which implements addresses as four distinct classes: Address, ZipCode, City, and State. The advantage of this approach is twofold – first of all the zip code functionality is implemented in one place, increasing the cohesiveness of our model. Second, by making zip codes, cities, and states their own separate classes we can now easily group addresses based on various criteria for reporting purposes, increasing the flexibility of our application.  The main drawback is that to build a single address we have to build it from four distinct objects, increasing the code that we have to write, test, and maintain. The Object Primer 3rd Edition: Agile Model Driven Development (AMDD) with UML 2
We’re still not done, because the Seminar class of Figure 3 implements “date range” behavior – it has a start date and an end date, and it calculates the difference between the two dates. Because this sort of behavior forms a cohesive whole, and because it is more than likely needed in other places, it makes sense to introduce the class DateRange of Figure 4.

Figure 4. The object schema in 3ONF.


4. How Does Class Normalization Relate to Other Object Design Techniques

Fundamentally class normalization is a technique for improving the quality of your object schemas. The exact same thing can be said of the application of common design pattern, such as those defined by the “Gang of Four (GoF)” in Design Patterns (Gamma et. al. 1995). Design patterns are known solutions to common problems, examples of which include the Strategy pattern for implementing a collection of related algorithms and the Singleton pattern for implementing a class that only has one instance.  The application of common design patterns will often result in a highly normalized object schema, although the overzealous application of design patterns can result in you overbuilding your software unnecessarily.  As Agile Modeling (AM) suggests, you should follow the practice Apply Patterns Gently and ease into a design pattern over time.

Another common approach to improving object schemas is refactoring (Fowler 1999), an approach overviewed in Database Refactoring.  Refactoring is a disciplined way to restructure code by applying small changes to your code to improve its design. Refactoring enables you to evolve your design slowly over time.  Class normalization and refactoring fit together quite well – as you’re normalizing your classes you will effectively be applying many known refactorings to your object schema. A fundamental difference between class normalization and refactoring is that class normalization is typically performed to your models whereas refactorings are applied to your source code.

Do you need to understand all three techniques?  Yes. It is always beneficial to have several techniques in your intellectual toolkit.  What would you think of a carpenter with only one type of saw, one type of hammer, and one type of screwdriver in their toolkit?  My guess would be that they wouldn’t be as effective as one with a selection of tools. Same thing can be said of agile software developers.


5. What Have You Learned?

Although these techniques aren’t as popular as refactoring or the application of design patterns, I believe that they are important because they provide a very good bridge between the object and data paradigms.  The rules of class normalization provide advice that effective object designers have been doing for years, so there is really nothing new in that respect. However, they describe basic object design techniques in a manner that data professionals such as Agile DBAs can readily understand, helping to improve the communication within your project teams.

My hope is that you have discovered that there is a fair bit to OO. I also hope that you recognize that there is some value in at least understanding the basic fundamentals of OO, and better yet you may even decide to gain more experience in it.  Object technology is real, being used for mission-critical systems, and is here to stay. At a minimum every IT professional needs to be familiar with it.