Modular Namespaces (MNS)

Author:
    James Clark (Thai Open Source Software Center) <jjc@thaiopensource.com>
Date:
    2003-01-31

Copyright © 2003 Thai Open Source Software Center Ltd

Contents


Introduction

The XML Namespaces Recommendation allows an XML document to be composed of elements and attributes from multiple independent namespaces. Each of these namespaces may have its own schema. The problem then arises of how the schemas can be composed in order to allow validation of the complete document.

In RELAX Namespace, Murata Makoto pioneered the idea of dividing the document into islands, with each island containing a single namespace, and validating each island separately against the schema for its namespace. RELAX Namespace formed the basis for the recently published Committee Draft of Document Schema Definition Languages (DSDL) -- Part 4: Selection of Validation Candidates.

This document presents a language named Modular Namespaces (MNS), which is an evolution of the ideas in RELAX Namespace and DSDL Part 4. RELAX Namespace was designed to work well with RELAX Core. RELAX Core cannot deal with documents that use multiple namespaces, nor does it provide any namespace-based wildcards. These limitations of RELAX Core are reflected in the design of RELAX Namespace. MNS is designed to be able to take advantage of more recent schema languages, such as RELAX NG, that are not limited in this way.

A sample implementation of MNS is included in Jing.

It is hoped that this will be a useful contribution to the future development of DSDL Part 4.

Basics

In its simplest form, a MNS schema consists of a mapping from namespace URIs to schema URIs. An MNS schema is written in XML. Here is a RELAX NG compact syntax schema for this simplest form of MNS schema:

default namespace = "http://www.thaiopensource.com/ns/mns"

start =
  element rules {
    schemaType?,
    element validate {
      schemaType?,
      attribute ns { xsd:anyURI },
      attribute schema { xsd:anyURI }
    }*
  }

schemaType = attribute schemaType { mediaType }
mediaType = xsd:string

Validity of an instance with respect to a MNS schema is determined as follows. First, a set of validation subjects is identified. Each validation subject is an element in the instance. Associated with each validation subject is a schema. If all the validation subjects are valid with respect to their associated schemas, then the instance is considered valid with respect to the MNS schema.

It is important to understand that when a validation subject is validated with respect to its schema, then it is validated along with all its descendants and attributes. One validation subject may have other validation subjects as ancestors. In this case, a validation subject will be validated with respect to more than one schema. Not only will it be validated with respect to its schema, but it will also be validated as part of validation of ancestor validation subjects with respect to their schemas.

An element is a validation subject if it has no parent or if its namespace URI is different from that of its parent. A validation subject must have a validate rule for its namespace URI: there must be a validate element whose ns attribute is the same as the validate subject's namespace URI. The value of the ns attribute of the validate element can be the empty string to specify the absent namespace URI.

The associated schema is specified by the schema attribute of the validate element. The schema can be in any language supported by the particular implementation. When the schema is XML, the language of the schema is detected from the namespace URI of the document element. When the schema is not XML, then MNS relies on the MIME type of the result of fetching the URI; the schemaType attribute can be used to specify the MIME type explicitly. A MIME type of application/x-rnc can be used for RELAX NG compact syntax. The schemaType attribute on the rules element specifies the default value of of the schemaType on validate element. Note that the schema attribute may refer to another MNS schema.

MNS has additional features that provide further control over the selection of validation subjects and their associated schemas.

Should it be possible to put the schema inline in the MNS wrapped in, say, a schema element? How does this impact extensibility?

Should MNS have an include element? The ability to recursively reference MNS maybe makes this unnecessary.

Lax processing

Sometimes it may be desirable to allow elements from namespaces for which there are no validate rules. This can be done by adding an empty lax element to the rules element:

default namespace = "http://www.thaiopensource.com/ns/mns"

start = element rules { schemaType?, (validate* & lax?) }

validate =
  element validate {
    schemaType?,
    attribute ns { xsd:anyURI },
    attribute schema { xsd:anyURI }
  }

lax = element lax { empty }
schemaType = attribute schemaType { mediaType }
mediaType = xsd:string

We will refer to an element that could be a validation subject if there is an applicable validate element as a potential validation subject. In the absence of a lax element, there must be an applicable validate element for every potential validation subject; with a lax element, there need not be.

Attributes

Attributes can also be validation subjects:

default namespace = "http://www.thaiopensource.com/ns/mns"

start = element rules { schemaType?, (validate* & lax?) }

validate =
  element validate|validateAttributes {
    schemaType?,
    attribute ns { xsd:anyURI },
    attribute schema { xsd:anyURI }
  }

lax = element lax { attribute allow { "attributes" | "elements" }? }
schemaType = attribute schemaType { mediaType }
mediaType = xsd:string

If an element has attributes that are namespace qualified with a namespace URI other than the namespace URI of the element itself, then the set of all attributes on that element with that namespace URI is a potential validation subject. If there is a validateAttributes element for that namespace URI, then it becomes a validation subject. The associated schema is specified by the schema attribute of the validateAttributes element. Unqualified attributes are never validation subjects.

Lax processing for elements means that a potential validation subject that is an element need not have an applicable validate element. Lax processing for attributes means that a potential validation subject that is an attribute need not have an applicable validateAttributes element. In the absence of a lax element, neither attributes nor elements are processed laxly. By default, the lax element enables lax processing for both elements and attributes. If allow="attributes" is specified, then lax processing is enabled for attributes only; if allow="elements" is specified, then lax processing is enabled for elements only.

Normally, schema languages (including RELAX NG) validate an element rather than a set of attributes. To work around this, MNS performs parallel transformations on the set of attributes and on the schema. The set of attributes is transformed by attaching the attributes to an element with a particular namespace name and namespace URI. The schema identified by a validateAttributes element is transformed to match. In the case of RELAX NG, when a validateAttributes element specifies a schema of s, MNS actually uses a schema of:

element * { external "s" }

Validation subjects with multiple namespaces

By default, an element is a potential validation subject if its namespace URI is different from its parent. This behavior can be changed by adding one or more cover children to validate elements. With the introduction of cover elements, the rule is that an element is a potential validation subject if it is not covered by an ancestor validation subject. Each validation subject has a set of namespace URIs that it covers. A validation subject always covers it own namespace URI. In addition, it covers the namespace URIs specified by the cover elements in its validate element. One obvious case where it is useful for a schema to cover more than one namespace is when a validate refers recursively to an MNS schema.

default namespace = "http://www.thaiopensource.com/ns/mns"

start = element rules { schemaType?, (validate* & lax?) }

validate =
  element validate {
    validateModel,
    element cover { nsAtt }*
  }
  | element validateAttributes { validateModel }

validateModel = nsAtt, attribute schema { xsd:anyURI }, schemaType?

lax = element lax { attribute allow { "attributes" | "elements" }? }
nsAtt = attribute ns { xsd:anyURI }

schemaType = attribute schemaType { mediaType }

mediaType = xsd:string

The set elements covered by a validation subject is determined from the set of namespace URI that it covers by the following two rules:

The rule for attributes is very similar. A set of attributes is a potential validation subject if and only if:

Just as with elements, an attribute is covered by an validation subject if its parent element and its namespace URI are covered by that validation subject.

Note that a validate element is applicable to a potential validation subject only if the namespace URI of the potential validation subject is as specified by the ns attribute. Any cover child elements do not affect this.

Should there be separate sets of namespaces that cover elements and cover attributes?

Should it be possible to cover all namespaces except a particular (possibly empty) finite set of namespaces?

Modes

The selection of validation subjects and associated schemas may need to be context dependent. For example, not all namespace URIs may be acceptable for the document element, or the document element may need to be validated against a different schema from that used for subtrees with the same namespace URI.

Context dependence is specified by means of modes.

default namespace = "http://www.thaiopensource.com/ns/mns"

start =
  element rules {
    schemaType?,
    attribute startMode { mode }?,
    (validate|lax)*
  }

validate =
  element validate {
    validateModel,
    attribute useMode { mode }?,
    element cover { nsAtt }*
  }
  | element validateAttributes { validateModel }

lax = 
  element lax {
    attribute allow { elementsOrAttributes }?,
    inModesAtt?
  }

validateModel = nsAtt, attribute schema { xsd:anyURI }, inModesAtt?, schemaType?

nsAtt = attribute ns { xsd:anyURI }

schemaType = attribute schemaType { mediaType }

mediaType = xsd:string

inModesAtt = attribute inModes { list { mode+ } }

mode = xsd:NCName | "#default"

elementsOrAttributes =
  list {
    ("elements", "attributes") 
     | ("attributes", "elements") 
     | "elements"
     | "attributes"
     | empty
  }

The selection of validation subjects takes place with respect to a named mode. A mode is named by an NCName. In addition, there is a default mode named #default. The validate and validateAttributes elements have an optional inModes attribute, which specifies the modes in which the elements are applicable. The default value of the inModes attribute is the default mode. It is an error if there is a mode and a namespace URI for which more than one validate or validateAttributes element is applicable. Thus, when an element is selected as a validation subject there is a unique applicable validate element.

The mode used to select whether a particular element or set of attributes is a validation subject is specified by the useMode attribute of the validate element applicable to the nearest ancestor validation subject. The default value of the useMode attribute is the default mode. If an element or set of attributes has no ancestor validation subject, then the mode used is determined by the startMode attribute on the rules element; the default value of the startMode attribute is also the default mode. Looking at this more procedurally, processing is top-down; the starting mode is specified by the startMode attribute. Within each validation subject element, processing switches to the mode specified by the useMode attribute.

Whether processing is lax also depends on the mode. The lax element has an inModes attribute that specifies the modes in which they apply. There can be multiple lax elements specifying lax processing for different modes. As usual, the default value of the inModes attribute is the default mode.

For every mode named in a useMode attribute other than the default mode, there must be at least one validate, validateAttribute or lax element that includes that mode in its inModes attribute. For a mode that allows nothing, either the default mode can be used or a <lax allow="" inModes="m"/> rule can be added. The allowed value of the allow attribute is, in fact, a list of between zero and two distinct tokens from the set elements and attributes.

Pruning

We can distinguish between schemas that are open and schemas that are closed. Open schemas allow attributes and elements in other namespaces; closed schemas are not. Sometimes it is necessary to treat a closed schema as open. This can be done by adding a prune attribute to validate. This has the effect of removing all potential validation candidates that are elements or attributes from the subtree before validating the subtree with respect to the schema specified by the validate, according as the value of the prune attribute contains the token elements or attributes.

attribute prune { elementsOrAttributes }?

More on modes

Sometimes the processing mode to be used for an element may need to depend on the name of the parent of that element. For example, we might wish to allow elements of a particular namespace only within the XHTML head element and not anywhere else. To do this, one or more context elements are added to the validate element. The content of the context element identifies a context; the useMode attribute identifies a mode to be applied to potential validation subjects in that context. The default value of the useMode attribute is the default mode as usual.

The context relates to the ancestry of the potential validation subject starting with its parent element and continuing up and including its nearest ancestor validation subject. The context identified by a context element is the union of the contexts identified by each of its children. An element element specifies a parent whose local name is equal to the value of the name attribute and whose namespace is equal to the value of the ns attribute. The namespace must the same as that specified on the validate element or one of its cover elements. The ns attribute is inherited and so it is unnecessary to specify it except when there are cover elements.

A context of the form:

<element name="x">
  <element name="y">
    <element name="z"/>
  </element>
</element>

applies to a potential validation subject with a parent z, a grandparent y and a great grandparent x. A context of the form:

<root>
  <element name="x"/>
</root>

applies to a potential validation subject with a parent element x such that that parent element is the nearest ancestor validation subject of that potential validation subject.

The children of the context elements of a particular validate element must all identify distinct contexts. It is possible for a single potential validation subject to match multiple distinct context children. A context child containing more element elements takes precedence over one containing fewer element elements. Amongst context children containing the same number of element elements, one that has a root element takes precedence over one that does not.

default namespace = "http://www.thaiopensource.com/ns/mns"

start =
  element rules {
    schemaType?,
    attribute startMode { mode }?,
    (validate|lax)*
  }

validate =
  element validate {
    validateModel,
    useModeAtt?,
    attribute prune { elementsOrAttributes }?,
    element cover { nsAtt }*,
    context*
  }
  | element validateAttributes { validateModel }

context = element context { useModeAtt?, nsAtt?, (rootContext|elementContext)+ }

rootContext = element root { nsAtt?, elementContext }

elementContext =
  element element {
    attribute name { xsd:NCName },
    nsAtt?,
    elementContext?
  }

lax =
  element lax {
    attribute allow { elementsOrAttributes }?,
    inModesAtt?,
  }

validateModel =
  nsAtt,
  attribute schema { xsd:anyURI },
  schemaType?,
  inModesAtt?

nsAtt = attribute ns { xsd:anyURI }

schemaType = attribute schemaType { mediaType }

mediaType = xsd:string

useModeAtt = attribute useMode { mode }

inModesAtt = attribute inModes { list { mode+ } }

mode = xsd:NCName | "#default"

elementsOrAttributes =
  list {
    ("elements", "attributes") 
     | ("attributes", "elements") 
     | "elements"
     | "attributes"
     | empty
  }

Extensibility

Just as with RELAX NG, foreign elements and attributes can be added to MNS schemas. Thus, the complete MNS schema is as follows:

namespace local = ""
default namespace mns = "http://www.thaiopensource.com/ns/mns"

start =
  element rules {
    schemaType?,
    attribute startMode { mode }?,
    ((validate | lax)* & foreign)
  }

validate =
  element validate {
    validateModel,
    useModeAtt?,
    attribute prune { elementsOrAttributes }?,
    ((cover*, context*) & foreign)
  }
  | element validateAttributes { validateModel, foreign }

cover = element cover { nsAtt, foreign }

context = element context {
    useModeAtt?,
    nsAtt?,
    ((rootContext|elementContext)+ & foreign)
  }

rootContext = element root { nsAtt?, (elementContext & foreign) }

elementContext =
  element element {
    nsAtt?,
    attribute name { xsd:NCName },
    (elementContext? & foreign)
  }

lax =
  element lax {
    attribute allow { elementsOrAttributes }?,
    inModesAtt?,
    foreign
  }

validateModel =
  nsAtt,
  attribute schema { xsd:anyURI },
  schemaType?,
  inModesAtt?

nsAtt = attribute ns { xsd:anyURI }

schemaType = attribute schemaType { mediaType }

mediaType = xsd:string

useModeAtt = attribute useMode { mode }

inModesAtt = attribute inModes { list { mode+ } }

mode = xsd:NCName | "#default"

elementsOrAttributes =
  list {
    ("elements", "attributes") 
     | ("attributes", "elements") 
     | "elements"
     | "attributes"
     | empty
  }

foreign =
  (attribute * - (mns:* | local:*) { text }
   | element * - mns:* { anything })*

anything = (text | attribute * { text } | element * { anything })*

Example

Suppose we want to validate an XHTML document that uses RDF within its head element. The following would do the job:

<rules xmlns="http://www.thaiopensource.com/ns/mns" startMode="xhtml">
  <validate ns="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
            schema="rdfxml.rng"
            inModes="rdf"
            useMode="anything"/>
  <validate ns="http://www.w3.org/1999/xhtml"
            schema="xhtml.rng"
            inModes="xhtml"
            prune="elements">
    <context useMode="rdf">
      <element name="head"/>
    </context>
  </validate>
  <lax inModes="anything"/>
</rules>

Note the following points:

Comparison with W3C XML Schema

W3C XML Schema (XSD) includes features for namespace modulariy that are similar in some ways to MNS. Like MNS, XSD validation uses a mapping from namespace URIs to schemas. However, there are important differences.