![]() |
Advanced XML? No, Just Realistic XMLAgileData.org: Techniques for Disciplined Agile Database Development |
![]() |
What is XML? From the data point of view XML is simply a standardized approach to storing text-based data in a hierarchical manner and to defining meta data about said data. The data is stored in structures called XML documents and the meta data is contained in document type definitions (DTDs) or the newer XML Schema definitions (XML Schema will likely replace DTDs within the next few years). From an object-oriented point of view XML is a data representation, backed by meta data, plus a collection of standardized (or at least in the process of being standardized) technologies. The critical standards are overviewed in Table 1 and details are posted at www.w3c.org.
|
Standard |
Description |
|
Extensible Stylesheet Language |
XSL enables you to present data in a paginated format. XSL supports the ability to apply formatting rules to elements (e.g. to display Model With Others as Model With Others), to apply formatting rules to pages to add things like headers and footers, and to render XML documents on various display technologies. XSL is typically used to publish documents, often for printing, whereas XSLT is used to generate markup-oriented presentations such as HTML or VoiceXML. |
|
Extensible Stylesheet Language Transformations |
XSLT enables you to transform data from one format to another. XSLT is often used to rearrange the order of the content within an XML document so that it makes the most sense for display. XSLT is effectively used to transform data documents into presentation documents, and then a user interface technology such as XSL or a Cascading Style Sheet (CSS) is used to publish or display the data. It is important to recognize that XSLT suffers from performance issues when compared to traditional programming languages. |
|
XML Linking Language |
XLink enables you to link data between elements. A link can be a simple link that references a single document (similar to a link in an HTML document) or a complex extended link that references multiple target documents. In other works simple links implement one-to-one associations between XML documents and extended links implement one-to-many associations. Combined with XPointer, you can reference specific portions of other XML documents. |
|
XML Namespaces |
Namespaces enable you to use the same XML tag, such as name in the XML document example, in several places within the same or different XML documents. This prevents name collisions just as packages within Java present name collisions between classes (e.g. you could have an Address class in the Customer package and an Address class in the Communication package). Namespaces are indicated by the xmlns keyword associated to the XML element tag as you see with the three namespaces assigned to locations in the XML document example. |
|
XML Path Language |
XPath enables you to refer to data elements within an XML document. The XPath statement /locations/office[Name=”Ambysoft Canada”] refers to the second office listed in the XML document example. XPath statements are typically passed to operations in order to reference a location within an XML document. |
|
XML Pointer Language |
XPointer enables you to specify locations within an XML document, extending XPath to include the notion of ranges and points. This enables you to both specify elements within a specific node or to cross node boundaries. XPointer is useful for hypertext applications. |
|
XML Query Language |
XQuery enables you to search for data within an XML document, the XML equivalent of an SQL SELECT statement. XQuery uses XPath statements to build a complex, multiple criteria expressions. XQuery is best used to find multiple XML documents in an XML database. |
|
XML Schema |
XML Schema enables you to define the structure and definition rules of an XML document. DTDs can only be used to define the structure. XML Schema provides the ability to specify data types to the level of precision that you see in programming languages and simple data modeling CASE tools. You can specify simple types such as strings or create your own “complex types” (data structures). You can also specify the cardinality and optionality, what the UML combines into the single concept of multiplicity, for an attribute. Simple validation rules can be defined as well. The greatest drawback of XML Schema is its complexity because it has a large feature set. |
Figure 1. Sample XML Document.
|
<locations xmlns:offc = “http://www.ambysoft.com/names/office" xmlns:st = “http://www.ambysoft.com/names/state” xmlns:ctry = “http://www.ambysoft.com/names/country”>
<offc:office> <offc:name>Ambysoft US</office:name> <st:state> <st:name>Alaska</state:name> <st:area>Southern Alaska</state:area> </st:state> <ctry:country> <country:name>United States of America</country:name> </ctry:country> </offc:office> <offc:office> <offc:name>Ambysoft Canada</office:name> <st:state> <st:name>Ontario</state:name> <st:area>Great White North</state:area> </st:state> <ctry:country> <country:name>Canada</country:name> </ctry:country> </offc:office> |
I’d like to start by cutting through the XML hype to describe what I believe are real-world, effective uses of XML. In order of importance, these include:
Data transfer within an “application”.
Application integration.
Data storage (files).
Data storage (databases).
The important thing to understand is that XML is being used for practical purposes. However, the “world changing” uses – such as easy and full integration of legacy systems, domination of e-commerce with the retail market, and the emergence of widely available web services – have not come about. Nor will they any time soon. If ever. My point is that if you remain realistic about XML you’ll discover some interesting uses for it because it is quite useful.
A vocabulary goes beyond structure to address the semantics of the
data captured within the structure including the pertinent taxonomical and
ontological relationships of the data. Whew, what a mouthful.
Let’s explore this definition a piece at a time.
When we say that we’re defining the semantics of data what
we’re really doing is defining its meaning. For our purposes to define
the semantics of data you need to identify the allowable values for data
attributes and the relationships between those values. Consider the
inventory catalog for grocery chain. One of the items they carry is ice
cream. According to the industry standard ice cream DTD a type of ice
cream is described by two tags – Volume and Flavor. You look at several
existing XML documents and see value pairs of {3, Chocolate}, {2.5, Rocky Road},
and {400, Vanilla}. 400 what? Litres? Ounces? Isn’t
Rocky Road a type of chocolate ice cream. In other words, knowing the
structure isn’t sufficient, you also need to know the semantics.
Now let’s assume that we each work for different grocery chains
and we’re trying to share ice cream information with one another via XML.
My chain carries chocolate, strawberry, and vanilla ice cream. Your chain
carries Chocolate, Rocky Road, Mocha Fudge, Swiss Fudge, Strawberry Classic,
Ultra Strawberry, Royal Vanilla, Exquisite Vanilla, and Tiger Tail.
Although we both sell ice cream, and you sell all the flavors that I do, it’s
very difficult for me to process your data because I need to map your flavors to
the ones that I understand. I would need to know that the Rocky Road,
Mocha Fudge, and Swiss Fudge are all types of chocolate, that Ultra Strawberry
is a type of strawberry, and so on. The end result would be a taxonomy, or
classification, of flavors.
Then we decide to start selling groceries online. We quickly
realize that our users search on a wide varieties of terms. For example,
if someone searches for desserts then ice creams, candies, and fresh fruit
should appear in the list. If someone else searches on frozen goods then
ice creams, frozen dinners, and frozen vegetables should appear in the list.
We need to relate the fact that ice cream is both a dessert and a frozen good,
among other things. In other words we need to define an ontology for our
product line that relates these concepts together.
Ontology goes beyond taxonomy. Where taxonomy addresses
classification hierarchies ontologies will represent and communicate knowledge
about a topic as well as a set of relationships and properties that hold for the
entities included within that topic.
Why is this important? First, I hope that it’s clear that
you need to be worried about more than just the structure of XML documents in
order to succeed. Second, if you can’t agree to the semantics of the
data that you’re sharing then integration is little more than a fantasy.
This is one of the reasons why I hold out little hope for XML Metadata
Interchange (XMI), the standard approach via which development tools are
supposed to share models. It’s arguable whether XMI defines the proper
data structure, it certainly doesn’t the rich semantics of the data that
vendors are supposedly sharing, and even if it did it is very unlikely the
vendors will ever agree to the semantics. To prove my point, although many
tools currently claim to support XMI to my knowledge there isn’t a single
combination where you can model in one tool, export that model to another,
update the model, then export it back to the original tool with any loss of
information.
To tell you the truth, I don’t invest a lot of time modeling XML documents. I prefer to keep my documents small and simple and as a result I can typically code the DTDs or Schema definitions by hand. Of course, if someone was to build a really slick XML modeling tool I’d be tempted to change my approach. Having said this it is valuable to understand the fundamentals of XML modeling because it’s going to help you even if you’re coding everything by hand.
Figure 2 and Figure 3 depict example XML models, using UML notation.
Figure 2. Modeling a Customer XML document.

Figure 3. Two ways to model an Order XML document.

Whenever I’m designing an XML document I like to keep several issues in mind. First, a good industry standard may already exist that I can reuse. Second, although size is not an issue with XML elements it is with relational database (RDB) columns. Therefore, if I intend to shred the XML document into an RDB then I may need to make its attributes more finely grained than XML technology would normally motivate me to. Third, I find that by following common data modeling practices that things usually work out. The rules of data normalization can and should be applied. Fourth, existing object and RDB schemas are a constraint that I need to consider.
When you use objects and XML documents together you need to map your object schema to your XML schema, something often referred to as data binding, just as you need to map your object schema to your relational data schema. As with relational databases there is an impedance mismatch between objects and XML documents. As you saw in Figure 2 and Figure 3 XML documents have a single root, Customer and Order respectively, but class models do not. This is because XML documents represent a hierarchical structure whereas object schemas are usually a network structure.
These tips and techniques work well for me when mapping objects to XML documents:
Let usage drive the design.
Major business concepts usually imply the need for corresponding XML documents.
Keep it simple.
Realize it isn’t always simple.
Realize that XML documents need to be flexible.
Modify object to relational mapping techniques.
Do some reading. Ron Bourret’s site is a great starting point.
Use consistent names.
The role names on an association are good tag names.
The type of association can indicate potential document boundaries.
Model a namespace using packages.
There are two fundamental strategies for persisting an XML document in a relational database (RDB):
“Shred” the document and store each element in a separate column.
Store the entire document in a single column.
My advice is that you if you’re going to use an RDB to store XML documents then you should take the first approach and shred the document. If you don’t want to incur this overhead then I would advise you to not use an RDB and either store the XML documents as individual files or use an XML database. In short, if you’re going to use an RDB then use the RDB.
So how do you make shredding work? The secret is in how you map your XML documents to your relational data schema. The following heuristics should help guide you:
Map a single XML element to a single database column.
Keep the types the same – character data in the XML document is character data in the database and so on.
Base both schemas on the same conceptual model and therefore use the same data element definitions for both.
Be flexible when you map (you can likely use XSLT to overcome any mismatches between the schemas).
It is critical to note that many of the technical issues of storing objects in RDBs are applicable to storing XML documents in RDBs. There is an impedance mismatch between the two technologies so you need to map between the two. You need to worry about concurrency control, transaction control, retrieving XML documents, referential integrity, and security access control. There is no free ride.
Although I prefer to work with RDBs on the back end, and realistically most organizations use relational databases as the primary means of storage, I do recognize that sometimes an XML database may be a valid option for you. Important issues you should consider include:
Concurrency.
Consistency.
Manipulation.
You don’t need to be pure.
Administrative functions.
When working with XML, the following strategies have worked very well for me:
Remember that XML isn’t your only option.
Adopt XML Schema.
Design your XML documents.
Use namespaces.
Use a real XML editor.
Your deployment environment determines your validation strategy.
XML has several advantages over previous data sharing and integration technologies such as common separated value (CSV) files or Common Object Request Broker Architecture (CORBA) objects:
XML is cross platform.
XML is standards based.
XML enjoys wide industry acceptance.
XML documents are human readable.
XML separates content from presentation.
XML is a middle-of-the-road approach.
XML isn’t perfect, nothing is, and as a result suffers from several challenges. These challenges are:
XML documents are bulky.
XML requires marshalling.
XML standards are still evolving.
XML business standards will prove elusive.
|
We actively work with clients around the world to improve their information technology (IT) practices, typically in the role of mentor/coach, team lead, or trainer. A full description of what we do, and how to contact us, can be found at Scott W. Ambler + Associates.
Copyright ©
2002-2012 Scott W. Ambler This site owned by Ambysoft Inc.