Agile Data

Advanced XML? No, Just Realistic XML

www.agiledata.org: Techniques for Successful Evolutionary/Agile Database Development

Scott W. Ambler
   Home  |  Agile DBAs  |  Developers  |  Enterprise Architects  |  Enterprise Administrators  |  Best Practices  |  Agility@Scale Blog  |  Contact Me 
Recently reviewed This article briefly overviews Extensible Markup Language (XML) techniques and technologies and discusses potential issues that project teams face when working with XML.  At the time of this writing XML is a robust and growing technology.  However, in everyone’s zeal to work with these new technologies many people have forgotten some of the data community’s hard-earned lessons.  Although XML has been clearly over-hyped it still has a very bright future.   As a result the primary goal of this article is to do some level setting with respect to XML.  

 

Table of Contents

  1. An XML Primer
  2. XML in Practice
  3. Vocabularies
  4. XML Data Modeling
  5. XML Mapping and Data Binding
  6. Persisting XML in Relational Databases
  7. Persisting XML in XML Databases
  8. XML Development Strategies
  9. Advantages of XML
  10. Challenges with XML  

 

1. An XML Primer

What is XML?  From the data point of view XML is simply a standardized approach to storing text-based data in a hierarchical manner and to defining meta data about said data.  The data is stored in structures called XML documents and the meta data is contained in document type definitions (DTDs) or the newer XML Schema definitions (XML Schema will likely replace DTDs within the next few years).  From an object-oriented point of view XML is a data representation, backed by meta data, plus a collection of standardized (or at least in the process of being standardized) technologies.  The critical standards are overviewed in Table 1 and details are posted at www.w3c.org.

 

Table 1.  XML standards.

Standard

Description

Extensible Stylesheet Language

XSL enables you to present data in a paginated format.  XSL supports the ability to apply formatting rules to elements (e.g. to display Model With Others as Model With Others), to apply formatting rules to pages to add things like headers and footers, and to render XML documents on various display technologies. XSL is typically used to publish documents, often for printing, whereas XSLT is used to generate markup-oriented presentations such as HTML or VoiceXML.

Extensible Stylesheet Language Transformations

XSLT enables you to transform data from one format to another.  XSLT is often used to rearrange the order of the content within an XML document so that it makes the most sense for display.  XSLT is effectively used to transform data documents into presentation documents, and then a user interface technology such as XSL or a Cascading Style Sheet (CSS) is used to publish or display the data.  It is important to recognize that XSLT suffers from performance issues when compared to traditional programming languages.

XML Linking Language

XLink enables you to link data between elements.  A link can be a simple link that references a single document (similar to a link in an HTML document) or a complex extended link that references multiple target documents.  In other works simple links implement one-to-one associations between XML documents and extended links implement one-to-many associations.  Combined with XPointer, you can reference specific portions of other XML documents.

XML Namespaces

Namespaces enable you to use the same XML tag, such as name in the XML document example, in several places within the same or different XML documents.  This prevents name collisions just as packages within Java present name collisions between classes (e.g. you could have an Address class in the Customer package and an Address class in the Communication package).  Namespaces are indicated by the xmlns keyword associated to the XML element tag as you see with the three namespaces assigned to locations in the XML document example.

XML Path Language

XPath enables you to refer to data elements within an XML document.  The XPath statement /locations/office[Name=”Ambysoft Canada”] refers to the second office listed in the XML document example.  XPath statements are typically passed to operations in order to reference a location within an XML document.

XML Pointer Language

XPointer enables you to specify locations within an XML document, extending XPath to include the notion of ranges and points.  This enables you to both specify elements within a specific node or to cross node boundaries.  XPointer is useful for hypertext applications.

XML Query Language

XQuery enables you to search for data within an XML document, the XML equivalent of an SQL SELECT statement.  XQuery uses XPath statements to build a complex, multiple criteria expressions.  XQuery is best used to find multiple XML documents in an XML database.

XML Schema

XML Schema enables you to define the structure and definition rules of an XML document.  DTDs can only be used to define the structure.  XML Schema provides the ability to specify data types to the level of precision that you see in programming languages and simple data modeling CASE tools.  You can specify simple types such as strings or create your own “complex types” (data structures).  You can also specify the cardinality and optionality, what the UML combines into the single concept of multiplicity, for an attribute.  Simple validation rules can be defined as well.  The greatest drawback of XML Schema is its complexity because it has a large feature set.

 

Figure 1. Sample XML Document.

<locations

     xmlns:offc = “http://www.ambysoft.com/names/office"

     xmlns:st = “http://www.ambysoft.com/names/state”

     xmlns:ctry = “http://www.ambysoft.com/names/country”>

    

     <offc:office>

          <offc:name>Ambysoft US</office:name>

          <st:state>

               <st:name>Alaska</state:name>

               <st:area>Southern Alaska</state:area>

     </st:state>

          <ctry:country>

               <country:name>United States of America</country:name>

          </ctry:country>

     </offc:office>

     <offc:office>

          <offc:name>Ambysoft Canada</office:name>

          <st:state>

               <st:name>Ontario</state:name>

               <st:area>Great White North</state:area>

     </st:state>

          <ctry:country>

               <country:name>Canada</country:name>

          </ctry:country>

     </offc:office>

 

</locations>

 

2. XML in Practice

I’d like to start by cutting through the XML hype to describe what I believe are real-world, effective uses of XML.  In order of importance, these include:

The important thing to understand is that XML is being used for practical purposes.  However, the “world changing” uses – such as easy and full integration of legacy systems, domination of e-commerce with the retail market, and the emergence of widely available web services – have not come about.  Nor will they any time soon.  If ever.  My point is that if you remain realistic about XML you’ll discover some interesting uses for it because it is quite useful.

 

3. Vocabularies

A vocabulary goes beyond structure to address the semantics of the data captured within the structure including the pertinent taxonomical and ontological relationships of the data.  Whew, what a mouthful.  Let’s explore this definition a piece at a time.

When we say that we’re defining the semantics of data what we’re really doing is defining its meaning.  For our purposes to define the semantics of data you need to identify the allowable values for data attributes and the relationships between those values.  Consider the inventory catalog for grocery chain.  One of the items they carry is ice cream.  According to the industry standard ice cream DTD a type of ice cream is described by two tags – Volume and Flavor.  You look at several existing XML documents and see value pairs of {3, Chocolate}, {2.5, Rocky Road}, and {400, Vanilla}.  400 what?  Litres?  Ounces?  Isn’t Rocky Road a type of chocolate ice cream.  In other words, knowing the structure isn’t sufficient, you also need to know the semantics.

Now let’s assume that we each work for different grocery chains and we’re trying to share ice cream information with one another via XML.  My chain carries chocolate, strawberry, and vanilla ice cream.  Your chain carries Chocolate, Rocky Road, Mocha Fudge, Swiss Fudge, Strawberry Classic, Ultra Strawberry, Royal Vanilla, Exquisite Vanilla, and Tiger Tail.  Although we both sell ice cream, and you sell all the flavors that I do, it’s very difficult for me to process your data because I need to map your flavors to the ones that I understand.  I would need to know that the Rocky Road, Mocha Fudge, and Swiss Fudge are all types of chocolate, that Ultra Strawberry is a type of strawberry, and so on.  The end result would be a taxonomy, or classification, of flavors.

Then we decide to start selling groceries online.  We quickly realize that our users search on a wide varieties of terms.  For example, if someone searches for desserts then ice creams, candies, and fresh fruit should appear in the list.  If someone else searches on frozen goods then ice creams, frozen dinners, and frozen vegetables should appear in the list.  We need to relate the fact that ice cream is both a dessert and a frozen good, among other things.  In other words we need to define an ontology for our product line that relates these concepts together.

Ontology goes beyond taxonomy.  Where taxonomy addresses classification hierarchies ontologies will represent and communicate knowledge about a topic as well as a set of relationships and properties that hold for the entities included within that topic. 

Why is this important?  First, I hope that it’s clear that you need to be worried about more than just the structure of XML documents in order to succeed.  Second, if you can’t agree to the semantics of the data that you’re sharing then integration is little more than a fantasy.  This is one of the reasons why I hold out little hope for XML Metadata Interchange (XMI), the standard approach via which development tools are supposed to share models.  It’s arguable whether XMI defines the proper data structure, it certainly doesn’t the rich semantics of the data that vendors are supposedly sharing, and even if it did it is very unlikely the vendors will ever agree to the semantics.  To prove my point, although many tools currently claim to support XMI to my knowledge there isn’t a single combination where you can model in one tool, export that model to another, update the model, then export it back to the original tool with any loss of information.

 

4. XML Data Modeling

To tell you the truth, I don’t invest a lot of time modeling XML documents.  I prefer to keep my documents small and simple and as a result I can typically code the DTDs or Schema definitions by hand.  Of course, if someone was to build a really slick XML modeling tool I’d be tempted to change my approach.  Having said this it is valuable to understand the fundamentals of XML modeling because it’s going to help you even if you’re coding everything by hand.

Figure 2 and Figure 3 depict example XML models, using UML notation. 

 

Figure 2. Modeling a Customer XML document.

 

Figure 3. Two ways to model an Order XML document.

Whenever I’m designing an XML document I like to keep several issues in mind.  First, a good industry standard may already exist that I can reuse. Second, although size is not an issue with XML elements it is with relational database (RDB) columns.  Therefore, if I intend to shred the XML document into an RDB then I may need to make its attributes more finely grained than XML technology would normally motivate me to.  Third, I find that by following common data modeling practices that things usually work out. The rules of data normalization can and should be applied.  Fourth, existing object and RDB schemas are a constraint that I need to consider. 

 

5. XML Mapping and Data Binding

When you use objects and XML documents together you need to map your object schema to your XML schema, something often referred to as data binding, just as you need to map your object schema to your relational data schema.  As with relational databases there is an impedance mismatch between objects and XML documents.  As you saw in Figure 2 and Figure 3 XML documents have a single root, Customer and Order respectively, but class models do not.  This is because XML documents represent a hierarchical structure whereas object schemas are usually a network structure.  

These tips and techniques work well for me when mapping objects to XML documents:

 

6. Persisting XML in Relational Databases

There are two fundamental strategies for persisting an XML document in a relational database (RDB):

My advice is that you if you’re going to use an RDB to store XML documents then you should take the first approach and shred the document.  If you don’t want to incur this overhead then I would advise you to not use an RDB and either store the XML documents as individual files or use an XML database.  In short, if you’re going to use an RDB then use the RDB.

So how do you make shredding work?  The secret is in how you map your XML documents to your relational data schema.  The following heuristics should help guide you:

 

7. Persisting XML in XML Databases

Although I prefer to work with RDBs on the back end, and realistically most organizations use relational databases as the primary means of storage, I do recognize that sometimes an XML database may be a valid option for you.  Important issues you should consider include:

 

8. XML Development Strategies

When working with XML, the following strategies have worked very well for me:

 

9. Advantages of XML

XML has several advantages over previous data sharing and integration technologies such as common separated value (CSV) files or Common Object Request Broker Architecture (CORBA) objects:

 

10. Challenges with XML

XML isn’t perfect, nothing is, and as a result suffers from several challenges.  These challenges are:

 

11. References and Suggested Online Readings

Designing XML Databases Modeling XML with UML  Using XML with Legacy Business Applications

Agile Database Techniques This book describes the philosophies and skills required for developers and database administrators to work together effectively on project teams following evolutionary software processes such as Extreme Programming (XP), the Rational Unified Process (RUP), the Agile Unified Process (AUP), Feature Driven Development (FDD), Dynamic System Development Method (DSDM), or The Enterprise Unified Process (EUP).  In March 2004 it won a Jolt Productivity award.
Refactoring Databases

This book describes, in detail, how to refactor a database schema to improve its design. The first section of the book overviews the fundamentals evolutionary database techniques in general and of database refactoring in detail.  More importantly it presents strategies for implementing and deploying database refactorings, in the context of both "simple" single application databases and in "complex" multi-application databases.  The second section, the majority of the book, is a database refactoring reference catalog.  It describes over 60 database refactorings, presenting data models overviewing each refactoring and the code to implement it.

 

The Object Primer 3rd Edition: Agile Model Driven Development (AMDD) with UML 2 This book presents a full-lifecycle, agile model driven development (AMDD) approach to software development.  It is one of the few books which covers both object-oriented and data-oriented development in a comprehensive and coherent manner.  Techniques the book covers include Agile Modeling (AM), Full Lifecycle Object-Oriented Testing (FLOOT), over 30 modeling techniques, agile database techniques, refactoring, and test driven development (TDD).  If you want to gain the skills required to build mission-critical applications in an agile manner, this is the book for you.
 

 

 

12. Let Me Help

I actively work with clients around the world to improve their information technology (IT) practices as both a mentor/coach and trainer.  A full description of what I do, and how to contact me, can be found here

 


Copyright © 2002-2010 Scott W. Ambler

This site owned by Ambysoft Inc.
Agile Modeling (AM)  |  Agile Unified Process (AUP)  |  Enterprise Unified Process (EUP)  |  Disciplined Agile Delivery (DAD)  |  My Writings   |  IT Surveys  

Follow Scott W. Ambler on Twitter