Compact Syntax for RELAX NG

This document describes the Compact Syntax for RELAX NG (a schema language for XML). The design goals of this syntax are:

The syntax is similar to the type syntax in the XQuery 1.0 Formal Semantics W3C Working Draft.

BNF

The syntax is defined by the following BNF:

topLevel ::= decl* topLevelBody

topLevelBody ::= pattern | grammar

decl ::=
  "namespace" identifier "=" (literal | "inherit")
  | "default" "namespace" identifier? "=" (literal | "inherit")
  | "datatypes" identifier "=" literal

pattern ::=
  particle
  | particle ("|" particle)+
  | particle ("," particle)+
  | particle ("&" particle)+
  | exceptParticle

particle ::= annotations primary followAnnotations occurrence?

exceptParticle ::=
  annotations datatypeName params? "-" annotations primary followAnnotations

primary ::=
  "(" pattern ")"
  | "element" nameClass "{" pattern "}"
  | "attribute" nameClass "{" pattern "}"
  | "mixed" "{" pattern "}"
  | "empty"
  | "notAllowed"
  | "text"
  | "list" "{" pattern "}"
  | datatypeName params?
  | datatypeName? datatypeValue
  | "grammar" "{" grammar "}"
  | ref
  | "parent" ref
  | "externalRef" literal inherit?

occurrence = ("*" | "+" | "?") followAnnotations

nameClass ::=
  basicNameClass followAnnotations
  | basicNameClass followAnnotations ("|" basicNameClass followAnnotations)+
  | openNameClass "-" basicNameClass followAnnotations

basicNameClass ::=
  annotations (identifier | CName)
  | openNameClass
  | annotations "(" nameClass ")"

openNameClass ::= annotations (nsName | anyName)

ref ::= identifierNotKeyword

datatypeName ::= CName | "string" | "token"

datatypeValue ::= literal

params ::= "{" (annotations identifier "=" literal)+ "}"

grammar ::= (definition | include | div | annotationElementNotKeyword)*

definition ::= annotations subject ("=" | "|=" | "&=") pattern

subject ::= "start" | identifierNotKeyword

include ::= annotations "include" literal inherit? includeBody?

includeBody ::= "{" (definition | includeDiv | annotationElementNotKeyword)* "}"

div ::= annotations "div" "{" grammar "}"

includeDiv ::= annotations "div" includeBody

inherit ::= "inherit" "=" identifier

followAnnotations ::= (">>" annotationElement)*

annotations ::= documentation* otherAnnotation?

otherAnnotation ::= "[" prefixedAnnotationAttribute* annotationElement* "]"

annotationAttribute ::= (identifier | CName) "=" literal

prefixedAnnotationAttribute ::= CName "=" literal

annotationElement ::= (identifier | CName) annotationElementBody

annotationElementNotKeyword ::=
  (identifierNotKeyword | CName) annotationElementBody

annotationElementBody ::=
  "[" annotationAttribute* (annotationElement | literal)* "]"

identifierNotKeyword ::= identifier - keyword

identifier ::= NCName | escapedIdentifier

keyword ::=
  "attribute" | "default" | "datatypes" | "div" | "element"
  | "empty" | "externalRef" | "grammar" | "include" | "inherit"
  | "list" | "mixed" | "namespace" | "notAllowed" | "parent"
  | "start" | "string" | "text" | "token"

CName ::= NCName ":" NCName
escapedIdentifier ::= "\" NCName
literal ::= literalSegment+
literalSegment ::= '"' [^"]* '"' | "'" [^']* "'"
nsName ::= NCName ":*"
anyName ::= "*"
documentation ::= "##" [^#xA]* (#xA [#x9#x20]* "##" [^#xA])*

The contents of consecutive literalSegments in a literal are concatenated.

Comments start with a # followed by anything other than # and continue to the end of the line.

element is defined in the XML 1.0 Recommendation; NCName is defined in the XML Namespaces Recommendation.

Note that keywords are case-sensitive. To use a keyword as the name of a definition, the keyword must be escaped with \. It is not necessary to escape a keyword that is used as the name of an element, attribute or datatype parameter.

Character escapes

Before parsing against the above grammar, the input string is first preprocessed by interpreting escapes. A sequence of characters \x{N}, where N consists of one or more hexadecimal digits, is replaced by the Unicode character with code N. For example,

element \x{66}\x{6f}\x{6f} { empty }

is equivalent to

element foo { empty }

Mapping to RELAX NG Syntax

The correspondence between the compact syntax and RELAX NG's XML syntax is shown by the following tables.

Patterns

Compact Syntax RELAX NG Syntax
p1 | p2 <choice> p1 p2 </choice>
p1 , p2 <group> p1 p2 </group>
p1 & p2 <interleave> p1 p2 </interleave>
p* <zeroOrMore> p </zeroOrMore>
p+ <oneOrMore> p </oneOrMore>
p? <optional> p </optional>
(p) p
element QName { p } <element name="QName"> p </element>
element nameClass { p } <element> nameClass p </element>
attribute QName { p } <attribute name="QName"> p </attribute>
attribute nameClass { p } <attribute> nameClass p </attribute>
empty <empty/>
notAllowed <notAllowed/>
text <text/>
mixed { p } <mixed> p </mixed>
list { p } <list> p </list>
identifierNotKeyword <ref name="identifierNotKeyword"/>
\identifier <ref name="identifier"/>
externalRef "uri" <externalRef href="uri"/>
parent identifier <parentRef name="identifier"/>
grammar { defs } <grammar> defs </grammar>
"string" <value>string</value>
string <data type="string"/>
token <data type="token"/>
prefix:localName <data type="localName" datatypeLibrary="uri"/>
prefix:localName "string" <value type="localName" datatypeLibrary="uri">string</value>
prefix:localName - p <data type="localName" datatypeLibrary="uri"><except> p </except></data>
prefix:localName { params } <data type="localName" datatypeLibrary="uri"> params </data>

Name classes

Compact Syntax RELAX NG Syntax
QName <name>QName</name>
prefix:* <nsName ns="uri"/>
prefix:* - nameClass <nsName ns="uri"<except> nameClass </except></nsName>
* <anyName/>
* - nameClass <anyName><except> nameClass </except></anyName>
nameClass1 | nameClass2 <choice> nameClass1 nameClass2 </choice>
(nameClass) nameClass

Parameters

Compact Syntax RELAX NG Syntax
localName = "string" <param name="localName">string</param>

Grammars

Compact Syntax RELAX NG Syntax
identifierNotKeyword = p <define name="identifierNotKeyword"> p </define>
identifierNotKeyword |= p <define name="identifierNotKeyword" combine="choice"> p </define>
identifierNotKeyword &= p <define name="identifierNotKeyword" combine="interleave"> p </define>
start = p <start> p </start>
\identifier = p <define name="identifier"> p </define>
include "uri" <include href="uri"/>
include "uri" { defs } <include href="uri"> defs </include>

Declarations

A datatypes declaration declares a prefix used in a QName identifying a datatype. For example,

datatypes xsd = "http://www.w3.org/2001/XMLSchema-datatypes"
element height { xsd:double }

A namespace declaration declares a prefix used in a QName specifying the name of an element or attribute. For example,

namespace rng = "http://relaxng.org/ns/structure/1.0"
element rng:text { empty }

A default namespace declaration declares the namespace used for unprefixed names specifying the name of an element (but not of an attribute). For example,

default namespace = "http://example.com"
element foo { attribute bar { string } }

is equivalent to

namespace ex = "http://example.com"
element ex:foo { attribute bar { string } }

A default namespace declaration may have a prefix as well. For example,

default namespace ex = "http://example.com"

is equivalent to

default namespace = "http://example.com"
namespace ex = "http://example.com"

The URI may be empty. This makes the prefix stand for the absent namespace URI. This is necessary for specifying a name class that matches any name with an absent namespace URI. For example:

namespace local = ""
element foo { attribute * - local:* { string }* }

is equivalent to

<element xmlns="http://relaxng.org/ns/structure/1.0""
         name="foo"
         ns="http://example.com">
  <zeroOrMore>
    <attribute>
      <anyName>
	<except>
	  <nsName ns=""/>
	</except>
      </anyName>
      <data type="string"/>
    </attribute>
  <zeroOrMore>
</element>

RELAX NG has the feature that if a file does not specify an ns attribute then the ns attribute can be inherited from the including file. To support this feature, the keyword inherit can be specified in place of the namespace URI in a namespace declaration. For example,

default namespace this = inherit
element foo { element * - this:* { string }* }

is equivalent to

<element xmlns="http://relaxng.org/ns/structure/1.0""
         name="foo">
  <zeroOrMore>
    <element>
      <anyName>
	<except>
	  <nsName/>
	</except>
      </anyName>
      <data type="string"/>
    </element>
  <zeroOrMore>
</element>

In addition, the include and externalRef patterns can specify inherit = prefix to specify the namespace to be inherited by the referenced file. For example,

namespace x = "http://www.example.com"
externalRef "foo.rng" inherit = x

is equivalent to

<externalRef href="foo.rng"
  ns="http://www.example.com"
  xmlns="http://relaxng.org/ns/structure/1.0"/>

In the absence of an inherit parameter on include or externalRef, the default namespace will be inherited by the referenced file.

In the absence of a default namespace declaration, a declaration of

default namespace = inherit

is assumed.

Annotations

RELAX NG supports two kinds of annotation: element annotations and attribute annotations. In the compact syntax, attribute annotations are written in a similar way to the XML syntax. For example, xml:lang = "en". Element annotations are written using the syntax

elementName [ attributesAndContent ]

where elementName is the QName of the element and attributesAndContent is a list of attributes followed by a list of elements and literals.

Annotations are attached in one of the following ways:

For example,

namespace a = "http://relaxng.org/ns/compatibility/annotations/1.0"

[ a:documentation [ "Represents a foo" ] ]
element foo
{
  [ a:defaultValue = "42" ]
  attribute bar { text }?,
  empty
}

turns into

<element name="foo"
    xmlns="http://relaxng.org/ns/structure/1.0"
    xmlns:a="http://relaxng.org/ns/compatibility/annotations/1.0">
  <a:documentation>Represents a foo</a:documentation>
  <optional>
    <attribute a:defaultValue="42" name="bar">
      <text/>
    </attribute>
  </optional>
  <empty/>
</element>

Here's another example using the RelaxNGCC annotations:

datatypes xsd = "http://www.w3.org/2001/XMLSchema-datatypes"
namespace c = "http://www.xml.gr.jp/xmlns/relaxngcc"

[ c:class="sample1" ]
start =
  element team {
    element player {
      attribute number {
        [ c:alias="number" ]
	xsd:positiveInteger >> c:java [ "System.out.println(number);" ]
      },
      element name {
        [ c:alias="name" ]
	text >> c:java [ "System.out.println(name);" ]
      }
    }+
  }

turns into

<grammar xmlns="http://relaxng.org/ns/structure/1.0"
         datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes"
         xmlns:c="http://www.xml.gr.jp/xmlns/relaxngcc">
  <start c:class="sample1">
    <element name="team">
      <oneOrMore>
        <element name="player">
          <attribute name="number">
            <data c:alias="number" type="positiveInteger"/>
            <c:java>System.out.println(number);</c:java>
          </attribute>
          <element name="name">
            <text c:alias="name"/>
            <c:java>System.out.println(name);</c:java>
          </element>
        </element>
      </oneOrMore>
    </element>
  </start>
</grammar>

In addition, there is a special syntax for specifying documentation elements from the http://relaxng.org/ns/compatibility/annotations/1.0 namespace as described in RELAX NG DTD Compatibility. For example,

## Represents a foo
element foo { empty }

turns into

<element name="foo"
    xmlns="http://relaxng.org/ns/structure/1.0"
    xmlns:a="http://relaxng.org/ns/compatibility/annotations/1.0">
  <a:documentation>Represents a foo</a:documentation>
  <empty/>
</element>

Open issues

Namespace declarations and value

There is a problem in translating a schema such as

<element xmlns="http://relaxng.org/ns/structure/1.0""
         datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes"
         name="foo">
  <choice>
    <value type="QName" xmlns:bar="http://example.com/1">bar:baz</value>
    <value type="QName" xmlns:bar="http://example.com/2">bar:baz</value>
  </choice>
</element>

into the compact syntax. Although this can be translated, for example, into

namespace bar1 = "http://example.com/1"
namespace bar2 = "http://example.com/2"
datatypes xsd = "http://www.w3.org/2001/XMLSchema-datatypes"

element foo { xsd:QName "bar1:baz" | xsd:QName "bar2:baz" }

doing so requires that the translator have knowledge of the QName datatype.

James Clark