RELAX NG Compact Syntax

Working Draft 4 November 2002

This version:: Working Draft: 4 November 2002

Editor:: James Clark <jjc@jclark.com>

This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to OASIS, except as needed for the purpose of developing OASIS specifications, in which case the procedures for copyrights defined in the OASIS Intellectual Property Rights document must be followed, or as required to translate it into languages other than English.

The limited permissions granted above are perpetual and will not be revoked by OASIS or its successors or assigns.

This document and the information contained herein is provided on an "AS IS" basis and OASIS DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

Abstract

This document specifies a compact, non-XML syntax for [RELAX NG].

Status of this Document

This is a working draft constructed by the editor. It is not an official committee work product and may not reflect the consensus opinion of the committee. Comments on this document may be sent to relax-ng-comment@lists.oasis-open.org.

1 Introduction

2 Syntax

3 Lexical structure

4 Declarations

5 Annotations

5.1 Initial annotations
5.2 Documentation shorthand
5.3 Following annotations
5.4 Grammar annotations

6 Conformance

6.1 Validator
6.2 Structure preserving translator
6.3 Non-structure preserving translator

Appendixes

A Formal description

A.1 Syntax

A.2 Lexical structure

A.2.1 Character encoding
A.2.2 BOM stripping
A.2.3 Newline normalization
A.2.4 Escape interpretation
A.2.5 Tokenization
A.2.6 Literal concatenation

B Compact syntax RELAX NG schema for RELAX NG (Non-Normative)

References

1. Introduction

This specification describes a compact, non-XML syntax for [RELAX NG].

The goals of this syntax are:

maximize readability;
support all features of RELAX NG; it must be possible to translate a schema from the XML syntax to the compact syntax and back without losing significant information;
support separate translation; a RELAX NG schema may be spread amongst multiple files; it must be possible to represent each of the files separately in the compact syntax; the representation of each file must not depend on the other files.

The syntax has similarities to [XQuery Formal Semantics], to [XDuce] and to the DTD syntax of [XML 1.0].

The body of this document contains an informal description of the syntax and how it maps onto the XML syntax. Developers should consult Appendix A. Formal description for a complete, rigorous description.

2. Syntax

The following is a summary of the syntax in EBNF. The reader may find it helpful to compare this with the syntax in Section 3 of [RELAX NG]. The start symbol is topLevel.

topLevel	::=	decl* (pattern \| grammarContent*)
decl	::=	"`namespace`" identifierOrKeyword "`=`" namespaceURILiteral \| "`default`" "`namespace`" [identifierOrKeyword] "`=`" namespaceURILiteral \| "`datatypes`" identifierOrKeyword "`=`" literal
pattern	::=	"`element`" nameClass "`{`" pattern "`}`" \| "`attribute`" nameClass "`{`" pattern "`}`" \| pattern ("`,`" pattern)+ \| pattern ("`&`" pattern)+ \| pattern ("`\|`" pattern)+ \| pattern "`?`" \| pattern "``" \| pattern* "`+`" \| "`list`" "`{`" pattern "`}`" \| "`mixed`" "`{`" pattern "`}`" \| identifier \| "`parent`" identifier \| "`empty`" \| "`text`" \| [datatypeName] datatypeValue \| datatypeName ["`{`" param* "`}`"] [exceptPattern] \| "`notAllowed`" \| "`external`" anyURILiteral [inherit] \| "`grammar`" "`{`" grammarContent* "`}`" \| "`(`" pattern "`)`"
param	::=	identifierOrKeyword "`=`" literal
exceptPattern	::=	"`-`" pattern
grammarContent	::=	start \| define \| "`div`" "`{`" grammarContent* "`}`" \| "`include`" anyURILiteral [inherit] ["`{`" includeContent* "`}`"]
includeContent	::=	define \| start \| "`div`" "`{`" includeContent* "`}`"
start	::=	"`start`" assignMethod pattern
define	::=	identifier assignMethod pattern
assignMethod	::=	"`=`" \| "`\|=`" \| "`&=`"
nameClass	::=	name \| nsName [exceptNameClass] \| anyName [exceptNameClass] \| nameClass "`\|`" nameClass \| "`(`" nameClass "`)`"
name	::=	identifierOrKeyword \| CName
exceptNameClass	::=	"`-`" nameClass
datatypeName	::=	CName \| "`string`" \| "`token`"
datatypeValue	::=	literal
anyURILiteral	::=	literal
namespaceURILiteral	::=	literal \| "`inherit`"
inherit	::=	"`inherit`" "`=`" identifierOrKeyword
identifierOrKeyword	::=	identifier \| keyword
identifier	::=	(NCName - keyword) \| quotedIdentifier
quotedIdentifier	::=	"`\`" NCName
CName	::=	NCName "`:`" NCName
nsName	::=	NCName "`:*`"
anyName	::=	"`*`"
literal	::=	literalSegment+
literalSegment	::=	'`"`' (Char - '`"`')* '`"`' \| "`'`" (Char - "`'`")* "`'`" \| '`"""`' (['`"`'] ['`"`'] (Char - '`"`'))* '`"""`' \| "`'''`" (["`'`"] ["`'`"] (Char - "`'`"))* "`'''`"
keyword	::=	"`attribute`" \| "`default`" \| "`datatypes`" \| "`div`" \| "`element`" \| "`empty`" \| "`external`" \| "`grammar`" \| "`include`" \| "`inherit`" \| "`list`" \| "`mixed`" \| "`namespace`" \| "`notAllowed`" \| "`parent`" \| "`start`" \| "`string`" \| "`text`" \| "`token`"

NCName is defined in [XML Namespaces]. Char is defined in [XML 1.0].

In order to use a keyword as an identifier, it must be quoted with \. It is not necessary to quote a keyword that is used as the name of an element or attribute or as datatype parameter.

The value of a literal is the concatenation of the values of its constituent literalSegments. A literalSegment is always terminated only by an occurrence of the same delimiter that began it. The delimited used to begin a literalSegment may be either one or three occurrences of a single or double quote character. The value of a literal segment consists of the characters between the delimiters. One way to get a literal whose value contains both a single and a double quote is to divide the literal into multiple literalSegments so that the single and double quote are in separate literalSegments. Another way is to use a literalSegment delimited by three single or double quotes.

Annotations can be specified as described in Section 5.

There is no notion of operator precedence. It is an error for patterns to combine the |, &, , and - operators without using parentheses to make the grouping explicit. For example, foo | bar, baz is not allowed; instead, either (foo | bar), baz or foo | (bar, baz) must be used. A similar restriction applies to name classes and the use of the | and - operators. These restrictions are not expressed in the above EBNF but they are made explicit in the BNF in Section A.1.

The value of an anyURILiteral specified with include or external is a URI reference to a grammar in the compact syntax.

3. Lexical structure

Whitespace is allowed between tokens. Tokens are the strings occurring in double quotes in the EBNF in Section 2, except that literalSegment, nsName, CName, identifier and quotedIdentifer are single tokens.

Comments are also allowed between tokens. Comments start with a # and continue to the end of the line. Comments starting with ## are treated specially; see Section 5.

A Unicode character with hex code N can be represented by the escape sequence \x{N}. Using such an escape sequence is completely equivalent to the entering the corresponding character directly. For example,

element \x{66}\x{6f}\x{6f} { empty }

is equivalent to

element foo { empty }

4. Declarations

A datatypes declaration declares a prefix used in a QName identifying a datatype. For example,

datatypes xsd = "http://www.w3.org/2001/XMLSchema-datatypes"
element height { xsd:double }

In fact, in the above example, the datatypes declaration is not required: the xsd prefix is predeclared to the above URI.

A namespace declaration declares a prefix used in a QName specifying the name of an element or attribute. For example,

namespace rng = "http://relaxng.org/ns/structure/1.0"
element rng:text { empty }

As in XML, the xml prefix is predeclared.

A default namespace declaration declares the namespace used for unprefixed names specifying the name of an element (but not of an attribute). For example,

default namespace = "http://example.com"
element foo { attribute bar { string } }

is equivalent to

namespace ex = "http://example.com"
element ex:foo { attribute bar { string } }

A default namespace declaration may have a prefix as well. For example,

default namespace ex = "http://example.com"

is equivalent to

default namespace = "http://example.com"
namespace ex = "http://example.com"

The URI may be empty. This makes the prefix stand for the absent namespace URI. This is necessary for specifying a name class that matches any name with an absent namespace URI. For example:

namespace local = ""
element foo { attribute * - local:* { string }* }

is equivalent to

<element xmlns="http://relaxng.org/ns/structure/1.0""
         name="foo"
         ns="http://example.com">
  <zeroOrMore>
    <attribute>
      <anyName>
	<except>
	  <nsName ns=""/>
	</except>
      </anyName>
      <data type="string"/>
    </attribute>
  <zeroOrMore>
</element>

RELAX NG has the feature that if a file does not specify an ns attribute then the ns attribute can be inherited from the including file. To support this feature, the keyword inherit can be specified in place of the namespace URI in a namespace declaration. For example,

default namespace this = inherit
element foo { element * - this:* { string }* }

is equivalent to

<element xmlns="http://relaxng.org/ns/structure/1.0""
         name="foo">
  <zeroOrMore>
    <element>
      <anyName>
	<except>
	  <nsName/>
	</except>
      </anyName>
      <data type="string"/>
    </element>
  <zeroOrMore>
</element>

In addition, the include and external patterns can specify inherit = prefix to specify the namespace to be inherited by the referenced file. For example,

namespace x = "http://www.example.com"
external "foo.rng" inherit = x

is equivalent to

<externalRef href="foo.rng"
  ns="http://www.example.com"
  xmlns="http://relaxng.org/ns/structure/1.0"/>

In the absence of an inherit parameter on include or external, the default namespace will be inherited by the referenced file.

In the absence of a default namespace declaration, a declaration of

default namespace = inherit

is assumed.

5. Annotations

5.1. Initial annotations

An annotation in square brackets can be inserted immediately before a pattern, param, nameClass, grammarContent or includeContent. It has the following syntax:

annotation	::=	"`[`" annotationAttribute* annotationElement* "`]`"
annotationAttribute	::=	name "`=`" literal
annotationElement	::=	name "`[`" annotationAttribute* (annotationElement \| literal)* "`]`"

Each of the annotationAttributes will turn into attributes on the corresponding RELAX NG element. Each of the annotationElements will turn into initial children of the corresponding RELAX NG element, except in the case where the RELAX NG element cannot have children, in which case they will turn into following elements.

5.2. Documentation shorthand

Comments starting with ## are used to specify documentation elements from the http://relaxng.org/ns/compatibility/annotations/1.0 namespace as described in [Compatibility]. For example,

## Represents a language
element lang { 
  ## English
  "en" |
  ## Japanese
  "jp"
}

turns into

<element name="lang"
    xmlns:a="http://relaxng.org/ns/compatibility/annotations/1.0"
    xmlns="http://relaxng.org/ns/structure/1.0">
  <a:documentation>Represents a language</a:documentation>
  <choice>
    <value>en</value>
    <a:documentation>English</a:documentation>
    <value>jp</value>
    <a:documentation>Japanese</a:documentation>
  </choice>
</element>

## comments can only be used immediately before before a pattern, nameClass, grammarContent or includeContent. Multiple ## comments are allowed. Multiple adjacent ## comments without any intervening blank lines are merged into a single documentation element. Any ## comments must precede any annotation in square brackets.

5.3. Following annotations

A pattern or nameClass may be followed by any number of followAnnotations with the following syntax:

followAnnotation ::= ">>" annotationElement

Each such annotationElement turns into a following sibling of the RELAX NG element representing the pattern or nameClass.

5.4. Grammar annotations

An annotationElement may be used in any place where grammarContent or includeContent is allowed. For example,

namespace x = "http://www.example.com"

start = foo

x:notation [ name="jpeg" systemId="http://www.example.com/jpeg" ]

foo = element foo { empty }

turns into

<grammar xmlns:x="http://www.example.com" 
         xmlns="http://relaxng.org/ns/structure/1.0">
  <start>
    <ref name="foo"/>
  </start>
  <x:notation name="jpeg" systemId="http://www.example.com/jpeg"/>
  <define name="foo">
    <element name="foo">
      <empty/>
    </element>
  </define>
</grammar>

If the name of such an element is a keyword, then it must be escaped.

6. Conformance

There are three kinds of conformant implementation.

6.1. Validator

A validator conforming to this specification must be able to determine whether a textual object is a correct RELAX NG Compact Syntax schema as specified in Appendix A. Formal description. It must also be able to determine for any XML document and for any correct RELAX NG Compact Syntax schema whether the document is valid (as defined in [RELAX NG]) with respect to the translation of the schema into XML syntax. It need not be able to output a representation of the translation of the schema into XML syntax.

The requirements in the preceding paragraph are subject to the provisions of the second paragraph of Section 8 of [RELAX NG].

6.2. Structure preserving translator

A structure preserving translator must be able to translate any correct RELAX NG Compact Syntax schema into an XML document whose data model is strictly equivalent to the translation specified in Appendix A. Formal description. For this purpose, two instances of the data model (as specified in Section 2 of [RELAX NG]) are considered strictly equivalent if they are identical after applying the simplifications specified in Sections 4.2, 4.3, 4.4, 4.8, 4.9 and 4.10 of [RELAX NG].

Note

The RELAX NG compact syntax is not a representation of the XML syntax of a RELAX NG schema; rather it is a representation of the semantics of a RELAX NG schema. Details of the XML syntax that were judged to be insignificant are not captured in the compact syntax. For example, in the XML syntax if the name class for an element or attribute pattern consists of just a single name, it can be expressed either as a name attribute or as a name element; howeverm in the compact syntax, there is only one way to express such a name class. The simplifications listed in the previous paragraph correspond to those syntactic details that are not captured in the compact syntax.

When comparing two include or externalRef patterns in the XML source for strict equivalence, the value of the href attributes are not compared; instead the referenced XML documents are compared for strict equivalence.

6.3. Non-structure preserving translator

A non-structure preserving translator must be able to translate any correct RELAX NG Compact Syntax schema into an XML document whose data model is loosely equivalent to the translation specified in Appendix A. Formal description. For this purpose, two instances of the data model (as specified in Section 2 of [RELAX NG]) are considered loosely equivalent if they are such that, after applying all the simplifications specified in Section 4 of [RELAX NG], one can be transformed into the other merely by reordering and renaming definitions.

Note

A validator for the compact syntax can be implemented as a combination of a non-structure preserving translator for the compact syntax and a validator for the XML syntax.

A. Formal description

A.1. Syntax

The compact syntax is specified by a grammar in BNF. The translation into the XML syntax is specified by annotations in the grammar.

The start symbol is topLevel.

The BNF description consists of a set of production rules. Each production rule has a left-hand side and right-hand side separated by ::=. The left-hand side specifies the name of a non-terminal. The right-hand side specifies a list of one or more alternatives separated by |. Each alternative consists of a sequence of terminals and non-terminals. A non-terminal is specified by a name in italics. A terminal is either a literal string in quotes or a named terminal specified by a name in bold italics. An alternative can also be specified as ε, which denotes an empty sequence of tokens.

Each alternative may be followed by references to one or more named constraints that apply to that alternative.

The translation into XML syntax is specified by associating a value with each terminal and non-terminal in the derivation. Each alternative in the BNF may be followed by an expression in curly braces, which specifies how to compute the value associated with the left-hand side non-terminal. Each terminal and non-terminal on the right-hand side can be labelled with a subscript specifying a variable name. When that variable name is used within the curly braces, it refers to the value associated with that terminal or non-terminal. If an alternative consists of a single terminal or non-terminal, then the expression in curly braces can be omitted; in this case the value of the left-hand side is the value of that terminal or non-terminal.

The result of the translation is not a string containing the XML representation of a RELAX NG schema, but rather is an instance of the data model described in Section 2 of [RELAX NG]; this instance will match the RELAX NG schema for RELAX NG.

A textual object is a correct RELAX NG Compact Syntax schema if:

it matches the grammar specified in this section,
it satisfies all the constraints specified in this section, and
the result of the translation is a correct RELAX NG schema.

The computation of the value of a non-terminal may make use of one or more arguments. When the name of such a non-terminal occurs on the left-hand-side of a production, it is followed by an argument list that declares the formal arguments for the non-terminal. When the name occurs on the right-hand side of a production, it may be followed by one or more assignments that specify the actual arguments which will be bound to the formal arguments during the computation of the value of the non-terminal. If there is no actual argument corresponding to a particular formal argument, then the formal argument is bound to the value of the variable with the same name as the name of the formal argument. In other words, for any variable x, a default actual argument of x := x is assumed. The expressions in curly braces on the right-hand side can refer to the formal arguments declared on the left-hand side. For example, see simpleNameClass.

In addition to explicit arguments, every non-terminal implicitly has an argument that specifies an environment for the interpretation of a pattern. By default, the implicit environment argument to each non-terminal is the same as its parent. This may be overridden for a particular non-terminal by including environment in the argument list. For example, see topLevel and preamble.

An environment specifies:

a mapping from datatype prefixes to URIs;
a mapping from namespace prefixes to URIs; a namespace prefix may be mapped to a special value inherit as well as to a URI;
the default namespace; the default namespace is either a URI or the special value inherit.

The special value inherit is used to indicate that a namespace URI should be inherited from the referencing schema.

In the initial environment used for the start symbol, xml is bound as a namespace prefix to http://www.w3.org/XML/1998/namespace, and xsd is bound as a datatype prefix to http://www.w3.org/2001/XMLSchema-datatypes.

The value of an expression is one of the following:

the constants true, false or inherit;
a string;
a name (a namespace URI/local name pair);
a qualified-name (a prefix/local name pair);
an XML fragment, where an XML fragment is a pair of a set of zero or more attributes and a content sequence of zero or more strings and elements, as described in the data model of [RELAX NG]; an XML fragment is thus the same kind of thing as what is matched against a RELAX NG pattern
an environment.

Each terminal and non-terminal has an associated type identified by a name. A type is simply a set of values. The value of a terminal or non-terminal is always a member of the set of values identified by the name of its type. The name of the type of a terminal or non-terminal is given following the keyword returns before ::= in the production rule. Similarly, each argument has a type, which is given immediately before the name of the argument.

The following types are all disjoint:

boolean contains true and false;
inherit contains inherit;
string contains all strings;
name contains all names;
qname contains all qualified-names;
environment contains all enviroments;
xml contains all XML fragments

It is also useful to identify some subtypes of xml. One type is a subtype of another if the set of values of the one type is a subset of the set of values of the other.

content contains all XML fragments that have an empty set of attributes;
elements contains all XML fragments that have an empty set of attributes and whose content sequence does not have any string members; it is a subtype of content;
element contains all XML fragments that have an empty set of attributes and whose content sequence consists of a single element; it is a subtype of elements;
attributes contains all XML fragments that have an empty content sequence;
attribute contains all XML fragments that have an empty content sequence and whose attribute set consists of a single attribute;

In addition it is useful to have the following union type.

namespaceURI is the union of string and inherit.

Expressions use the following notation:

x denotes the value of the variable named x;
( ) denotes an empty XML fragment;
(x, y) denotes the concatenation of the XML fragments x and y; the attributes of the resulting XML fragment consist of the union of the attributes of x and y and the content sequence consists of the concatenation of the content sequence of x and y (this is the same as the meaning of the comma operator in the compact syntax)
environment denotes the value of the implicit environment argument;
true, false and inherit are used to denote the corresponding special constant;
"xyzzy" denotes a string consisting of the characters xyzzy;

foo(x, y, . . . ) denotes the value of the function foo applied to the arguments x, y, . . . ; the available primitive functions are shown in the following table.

Primitive	Argument types	Return type	Description
qName(x, y)	string, string	qname	a qualified-name with prefix x and local part y
prefix(x)	qname	string	the prefix of the qualified-name x
localPart(x)	qname	string	the local-part of the qualified-name x
name(x, y)	string, string	name	a name with namespace URI x and local name y
attribute(x, y)	name, string	attribute	an XML fragment consisting of an attribute with name x and value y
element(x, y)	name, xml	element	an XML fragment consisting of an element with name x and attributes and children y
text(x)	string	content	an XML fragment whose content sequence consists of x if x is not the empty string, and otherwise the empty XML fragment
bindPrefix(x, y, z)	environment, string, namespaceURI	environment	an environment that is the same as x except that it has the prefix y bound to z
bindDefault(x, y)	environment, namespaceURI	environment	an environment that is the same as x except it has the default namespace z
bindDatatypePrefix(x, y, z)	environment, string, string	environment	an environment that is the same as x except that it has y bound as a prefix for datatypes to the URI z
lookupPrefix(x, y)	environment, string	string	the binding in the environment x for the prefix y; it is an error if there is no applicable binding
lookupDefault(x)	environment	namespaceURI	the default namespace of the environment x, or, if no default has been bound, inherit
lookupDatatypePrefix(x, y)	environment, string	string	the binding as a datatype prefix in the environment x for the prefix y; it is an error if there is no applicable binding
mapSchemaRef(x)	string	string	a URI; x is a URI reference of a resource containing a schema in the syntax described by this specification; the returned URI is the URI of a resource containing the translation of this schema into RELAX NG XML syntax; the restriction on the use of fragment identifiers specified in section 4.5 of [RELAX NG] applies to x
makeNsAttribute(x)	namespaceURI	attributes	an empty set if x is inherit, and otherwise an attribute whose namespace URI is the empty string, whose local name is `ns` and whose value is x
applyAnnotations(x, y)	xml, element	element	an element whose name is the name of y, whose attributes are the union of the first member of x and the attributes of y, and whose children are the concatenation of the second member of x and the children of y
applyAnnotationsGroup(x, y)	xml, elements	elements	equivalent to applyAnnotations(x, `<group>` y `</group>`) unless x is equal to ( ), in which case it is equivalent to y
applyAnnotationsChoice(x, y)	xml, elements	elements	equivalent to applyAnnotations(x, `<choice>` y `</choice>`) unless x is equal to ( ), in which case it is equivalent to y
stringConcat(x, y)	string, string	string	a string that is the concatenation of the strings x and y
datatypeAttributes(x, y)	string, string	attributes	a set of two attributes; both attributes have the empty string as their namespace URI; one attribute has local name `datatypeLibrary` and value x; the other attribute has local name `type` and value y
documentationElementName()		name	the name of the `documentation` element defined in [Compatibility], that is, the name with namespace URI `http://relaxng.org/ns/compatibility/annotations/1.0` and local name `documentation`

x ? y : z is a conditional expression, which denotes y if x is true and z if x is false;
<foo x> y </foo> denotes an element from the RELAX NG namespace with local name foo, attributes x and content x.

topLevel returns element  ::=
    preamble_e  topLevelBody(environment := e)_x
        { x }

preamble returns environment  ::=
    ε
        { environment }
    |  decl_e  preamble(environment := e)_d
        { d }

decl returns environment  ::=
    "namespace"  namespacePrefix_x  "="  namespaceURILiteral_y
        Constraint: xml prefix
        Constraint: xml namespace URI
        Constraint: duplicate declaration
        { bindPrefix(environment, x, y) }
    |  "default"  "namespace"  "="  namespaceURILiteral_x
        Constraint: xml namespace URI
        Constraint: duplicate declaration
        { bindDefault(environment, x) }
    |  "default"  "namespace"  namespacePrefix_x  "="  namespaceURILiteral_y
        Constraint: xml prefix
        Constraint: xml namespace URI
        Constraint: duplicate declaration
        { bindDefault(bindPrefix(environment, x, y), y) }
    |  "datatypes"  datatypePrefix_x  "="  literal_y
        Constraint: xsd prefix
        Constraint: datatypes URI
        Constraint: duplicate declaration
        { bindDatatypePrefix(environment, x, y) }

namespacePrefix returns string  ::=
    identifierOrKeyword
        Constraint: valid prefix

datatypePrefix returns string ::=
identifierOrKeyword

namespaceURILiteral returns namespaceURI  ::=
    literal
    |  "inherit"
        { inherit }

topLevelBody returns element  ::=
    pattern
        Constraint: single element
    |  grammar_x
        { <grammar> x </grammar> }

grammar returns elements  ::=
    ε
        { ( ) }
    |  member_x  grammar_y
        { (x, y) }

member returns element  ::=
    annotatedComponent
    |  annotationElementNotKeyword

annotatedComponent returns element  ::=
    annotations_x  component_y
        { applyAnnotations(x, y) }

component returns element  ::=
    start
    |  define
    |  include
    |  div

start returns element  ::=
    "start"  assignOp_x  pattern_y
        { <start x> y </start> }

define returns element  ::=
    identifier_x  assignOp_y  pattern_z
        { <define name=x y> z </define> }

assignOp returns attributes  ::=
    "="
        { ( ) }
    |  "|="
        { attribute(name("", "combine"), "choice") }
    |  "&="
        { attribute(name("", "combine"), "interleave") }

include returns element  ::=
    "include"  anyURILiteral_x  optInherit_y  optIncludeBody_z
        { <include href=mapSchemaRef(x) y> z </include> }

anyURILiteral returns string  ::=
    literal
        Constraint: any URI

optInherit returns attributes  ::=
    ε
        { makeNsAttribute(lookupDefault(environment)) }
    |  "inherit"  "="  identifierOrKeyword_x
        { makeNsAttribute(lookupPrefix(environment, x)) }

optIncludeBody returns elements  ::=
    ε
        { ( ) }
    |  "{"  includeBody_x  "}"
        { x }

includeBody returns elements  ::=
    ε
        { ( ) }
    |  includeMember_x  includeBody_y
        { (x, y) }

includeMember returns element  ::=
    annotatedIncludeComponent
    |  annotationElementNotKeyword

annotatedIncludeComponent returns element  ::=
    annotations_x  includeComponent_y
        { applyAnnotations(x, y) }

includeComponent returns element  ::=
    start
    |  define
    |  includeDiv

div returns element  ::=
    "div"  "{"  grammar_x  "}"
        { <div> x </div> }

includeDiv returns element  ::=
    "div"  "{"  includeBody_x  "}"
        { <div> x </div> }

pattern returns elements ::=
innerPattern(anno := ( ))

innerPattern(xml anno) returns elements  ::=
    innerParticle
    |  particleChoice_x
        { applyAnnotations(anno, <choice> x </choice>) }
    |  particleGroup_x
        { applyAnnotations(anno, <group> x </group>) }
    |  particleInterleave_x
        { applyAnnotations(anno, <interleave> x </interleave>) }
    |  annotatedDataExcept_x
        { applyAnnotationsGroup(anno, x) }

particleChoice returns elements  ::=
    particle_x  "|"  particle_y
        { (x, y) }
    |  particle_x  "|"  particleChoice_y
        { (x, y) }

particleGroup returns elements  ::=
    particle_x  ","  particle_y
        { (x, y) }
    |  particle_x  ","  particleGroup_y
        { (x, y) }

particleInterleave returns elements  ::=
    particle_x  "&"  particle_y
        { (x, y) }
    |  particle_x  "&"  particleInterleave_y
        { (x, y) }

particle returns elements ::=
innerParticle(anno := ( ))

innerParticle(xml anno) returns elements  ::=
    annotatedPrimary_x
        { applyAnnotationsGroup(anno, x) }
    |  repeatedPrimary_x  followAnnotations_y
        { (applyAnnotations(anno, x), y) }

repeatedPrimary returns element  ::=
    annotatedPrimary_x  "*"
        { <zeroOrMore> x </zeroOrMore> }
    |  annotatedPrimary_x  "+"
        { <oneOrMore> x </oneOrMore> }
    |  annotatedPrimary_x  "?"
        { <optional> x </optional> }

annotatedPrimary returns elements  ::=
    leadAnnotatedPrimary_x  followAnnotations_y
        { (x, y) }

annotatedDataExcept returns elements  ::=
    leadAnnotatedDataExcept_x  followAnnotations_y
        { (x, y) }

leadAnnotatedDataExcept returns element  ::=
    annotations_x  dataExcept_y
        { applyAnnotations(x, y) }

leadAnnotatedPrimary returns elements  ::=
    annotations_x  primary_y
        { applyAnnotations(x, y) }
    |  annotations_x  "("  innerPattern(anno := x)_y  ")"
        { y }

primary returns element  ::=
    "element"  nameClass(isElem := true)_x  "{"  pattern_y  "}"
        { <element> x y </element> }
    |  "attribute"  nameClass(isElem := false)_x  "{"  pattern_y  "}"
        { <attribute> x y </attribute> }
    |  "mixed"  "{"  pattern_x  "}"
        { <mixed> x </mixed> }
    |  "list"  "{"  pattern_x  "}"
        { <list> x </list> }
    |  datatypeName_x  optParams_y
        { <data x> y </data> }
    |  datatypeName_x  datatypeValue_y
        { <value x> y </value> }
    |  datatypeValue_x
        { <value> x </value> }
    |  "empty"
        { <empty/> }
    |  "notAllowed"
        { <notAllowed/> }
    |  "empty"
        { <text/> }
    |  ref_x
        { <ref name=x/> }
    |  "parent"  ref_x
        { <parentRef name=x/> }
    |  "grammar"  "{"  grammar_x  "}"
        { <grammar> x </grammar> }
    |  "external"  anyURILiteral_x  optInherit_y
        { <externalRef href=mapSchemaRef(x) y/> }

dataExcept returns element  ::=
    datatypeName_x  optParams_y  "-"  leadAnnotatedPrimary_z
        { <data x> y <except> z </except> </data> }

ref returns string ::=
identifier

datatypeName returns attributes  ::=
    CName_x
        { datatypeAttributes(lookupDatatypePrefix(environment, prefix(x)), localPart(x)) }
    |  "string"
        { datatypeAttributes("", "string") }
    |  "token"
        { datatypeAttributes("", "token") }

datatypeValue returns string ::=
literal

optParams returns elements  ::=
    ε
        { ( ) }
    |  "{"  params_x  "}"
        { x }

params returns elements  ::=
    ε
        { ( ) }
    |  param_x  params_y
        { (x, y) }

param returns element  ::=
    annotations_x  identifierOrKeyword_y  "="  literal_z
        { applyAnnotations(x, <param name=y> z </param>) }

nameClass(boolean isElem) returns elements ::=
innerNameClass(anno := ( ))

innerNameClass(boolean isElem, xml anno) returns elements  ::=
    annotatedSimpleNameClass_x
        { applyAnnotationsChoice(anno, x) }
    |  nameClassChoice_x
        { applyAnnotations(anno, <choice> x </choice>) }
    |  annotatedExceptNameClass_x
        { applyAnnotationsChoice(anno, x) }

nameClassChoice(boolean isElem) returns elements  ::=
    annotatedSimpleNameClass_x  "|"  annotatedSimpleNameClass_y
        { (x, y) }
    |  annotatedSimpleNameClass_x  "|"  nameClassChoice_y
        { (x, y) }

annotatedExceptNameClass(boolean isElem) returns elements  ::=
    leadAnnotatedExceptNameClass_x  followAnnotations_y
        { (x, y) }

leadAnnotatedExceptNameClass(boolean isElem) returns element  ::=
    annotations_x  exceptNameClass_y
        { applyAnnotations(x, y) }

annotatedSimpleNameClass(boolean isElem) returns elements  ::=
    leadAnnotatedSimpleNameClass_x  followAnnotations_y
        { (x, y) }

leadAnnotatedSimpleNameClass(boolean isElem) returns elements  ::=
    annotations_x  simpleNameClass_y
        { applyAnnotations(x, y) }
    |  annotations_x  "("  innerNameClass(anno := x)_y  ")"
        { y }

exceptNameClass(boolean isElem) returns element  ::=
    nsName_x  "-"  leadAnnotatedSimpleNameClass_y
        { <nsName makeNsAttribute(lookupPrefix(environment, x))> <except> y </except> </nsName> }
    |  "*"  "-"  leadAnnotatedSimpleNameClass_x
        { <anyName> <except> x </except> </anyName> }

simpleNameClass(boolean isElem) returns element  ::=
    identifierOrKeyword_x
        { <name makeNsAttribute(isElem ? lookupDefault(environment) : "")> x </name> }
    |  CName_x
        { <name makeNsAttribute(lookupPrefix(environment, prefix(x)))> localPart(x) </name> }
    |  nsName_x
        { <nsName makeNsAttribute(lookupPrefix(environment, x))/> }
    |  "*"
        { <anyName/> }

followAnnotations returns elements  ::=
    ε
        { ( ) }
    |  ">>"  annotationElement_x  followAnnotations_y
        { (x, y) }

annotations returns xml  ::=
    documentations_x
        { x }
    |  documentations_x  "["  prefixedAnnotationAttributes_y  annotationElements_z  "]"
        { (y, (x, z)) }

prefixedAnnotationAttributes returns attributes  ::=
    ε
        { ( ) }
    |  prefixedAnnotationAttribute_x  prefixedAnnotationAttributes_y
        Constraint: duplicate attributes
        Constraint: unqualified name
        { (x, y) }

annotationElements returns elements  ::=
    ε
        { ( ) }
    |  annotationElement_x  annotationElements_y
        { (x, y) }

annotationElement returns element  ::=
    identifierOrKeyword_x  annotationAttributesContent_y
        { element(name("", x), y) }
    |  colonAnnotationElement

annotationElementNotKeyword returns element  ::=
    identifier_x  annotationAttributesContent_y
        { element(name("", x), y) }
    |  colonAnnotationElement

colonAnnotationElement returns element  ::=
    prefixedName_x  annotationAttributesContent_y
        { element(x, y) }

annotationAttributesContent returns xml  ::=
    "["  annotationAttributes_x  annotationContent_y  "]"
        { (x, y) }

annotationContent returns content  ::=
    ε
        { ( ) }
    |  annotationElement_x  annotationContent_y
        { (x, y) }
    |  literal_x  annotationContent_y
        { (text(x), y) }

annotationAttributes returns attributes  ::=
    ε
        { ( ) }
    |  annotationAttribute_x  annotationAttributes_y
        Constraint: duplicate attributes
        { (x, y) }

annotationAttribute returns attribute  ::=
    prefixedAnnotationAttribute
    |  unprefixedAnnotationAttribute

prefixedAnnotationAttribute returns attribute  ::=
    prefixedName_x  "="  literal_y
        Constraint: xmlns namespace URI
        { attribute(x, y) }

prefixedName returns name  ::=
    CName_x
        Constraint: annotation inherit
        { name(lookupPrefix(environment, prefix(x)), localPart(x)) }

unprefixedAnnotationAttribute returns attribute  ::=
    identifierOrKeyword_x  "="  literal_y
        { attribute(name("", x), y) }

documentations returns elements  ::=
    ε
        { ( ) }
    |  documentation_x  documentations_y
        { (element(documentationElementName(), text(x)), y) }

identifierOrKeyword returns string  ::=
    identifier
    |  keyword

Constraint: valid prefix

It is an error if the value of a namespacePrefix is xmlns.

Constraint: xml prefix

It is an error if the value of namespacePrefix is xml and the the value of the namespaceURILiteral is not http://www.w3.org/XML/1998/namespace.

Constraint: xml namespace URI

It is an error if the value of the namespaceURILiteral is http://www.w3.org/XML/1998/namespace and the value of the namespacePrefix is not xml.

Constraint: xsd prefix

It is an error if the value of datatypePrefix is xsd and the the value of the literal is not http://www.w3.org/2001/XMLSchema-datatypes.

Constraint: datatypes URI

It is an error if the value of the literal in a datatypes declaration is not a syntactically legal value for a datatypeLibrary as specified in Section 3 of [RELAX NG].

Constraint: duplicate declaration

It is an error if there is more than one namespace declaration of a particular prefix, more than one default namespace declaration or more than one declaration of a particular datatypes prefix.

Constraint: single element

It is an error if a top-level pattern translates to a sequence of more than one element (which can happen as the result of the use of annotations).

Constraint: unqualified name

It is an error if the namespace URI of a prefixedName in a prefixedAnnotationAttributes is the empty string.

Constraint: xmlns namespace URI

It is an error if the namespace URI of a prefixedName in a prefixedAnnotationAttribute is http://www.w3.org/2000/xmlns.

Constraint: duplicate attributes

It is an error if a prefixedAnnotationAttributes or an annotationAttributes contains two attributes with the same namespace URI and local name.

Constraint: annotation inherit

It is an error if the namespace URI in the value of a prefixedName is inherit.

Constraint: any URI

It is an error if the value of the literal used with external or include declaration does not meet the requirements for the anyURI symbol specified in Section 3 of [RELAX NG].

A.2. Lexical structure

This section describes how to transform the textual representation of a RELAX NG schema in compact syntax into a sequence of tokens, which can be parsed using the grammar specified in Section A.1.

There are six distinct stages, which are logically consecutive; the result of each stage is the input to the following stage.

A.2.1. Character encoding

The textual representation of the RELAX NG schema in compact syntax may be either a sequence of Unicode characters or a sequence of bytes. In the latter case, the first stage is to transform the sequence of bytes to the sequence of characters. The sequence of bytes may have associated metadata specifying the encoding. One example of such metadata is the charset parameter in a MIME media type [RFC 2046]. If there is such metadata, then the specified encoding is used. Otherwise, the first two bytes of the sequence are examined. If these are #xFF followed by #xFE or #xFE followed by #xFF, then an encoding of UTF-16 [Unicode] will be used, little-endian in the former case, big-endian in the latter case. Otherwise an encoding of UTF-8 [Unicode] is used. It is an error if the sequence of bytes is not a legal sequence in the selected encoding.

A.2.2. BOM stripping

If the first character of the sequence is a byte order mark (#xFEFF), then it is removed.

A.2.3. Newline normalization

Representations of newlines are normalized to #xA in a similar way to [XML 1.0]. Specifically, each occurrence of a #xD character that is not followed by a #xA character or of a #xD, #xA character pair is transformed to #xA.

A.2.4. Escape interpretation

In this stage, each escape sequence of the form \x{n}, where n is a hexadecimal number, is replaced by the character with Unicode code n. The escape sequence must match the production escapeSequence; the value computed in the BNF is the Unicode code of the replacement character. It is an error if the replacement character does not match the Char production of [XML 1.0]. It is an error if the input character sequence contains a character sequence escapeOpen that does not start an escapeSequence. After an escape sequence has been replaced, scanning for escape sequences continues following the replacement character; thus \x{5C}x{5C} is transformed to \x{5C} not to \.

Note

The \ character that opens an escape sequence may be followed by more than one x. This makes it possible for there to be a reversible transformation that maps a schema to a form containing only ASCII characters; the transformation replaces adds an extra x to each existing escape sequence, and replaces every non-ASCII character by an escape sequence with exactly one x.

escapeSequence returns number  ::=
    escapeOpen  hexNumber_x  escapeClose
        { x }

escapeOpen returns void ::=
"\" xs "{"

xs returns void  ::=
    "x"
    |  "x"  xs

escapeClose returns void ::=
"}"

hexNumber returns number  ::=
    hexDigit
    |  hexNumber_x  hexDigit_y
        { (x * 16) + y }

hexDigit returns number  ::=
    "0"
        { 0 }
    |  "1"
        { 1 }
    |  "2"
        { 2 }
    |  "3"
        { 3 }
    |  "4"
        { 4 }
    |  "5"
        { 5 }
    |  "6"
        { 6 }
    |  "7"
        { 7 }
    |  "8"
        { 8 }
    |  "9"
        { 9 }
    |  [Aa]
        { 10 }
    |  [Bb]
        { 11 }
    |  [Cc]
        { 12 }
    |  [Dd]
        { 13 }
    |  [Ee]
        { 14 }
    |  [Ff]
        { 15 }

A.2.5. Tokenization

In this stage, the sequence of characters is tokenized: it is transformed into a sequence of tokens, where each token corresponds to a non-terminal in the grammar in Section A.1, except that the token sequence contains literalSegment tokens instead of literal tokens.

A sequence of characters is tokenized by first finding the longest initial subsequence that:

is one of the literal string non-terminals occurring in the BNF in Section A.1
matches the grammar of one of the named non-terminals other than literal that is referenced in Section A.1 and specified in this section, that is, identifier, CName, nsName or documentation
matches the grammar for literalSegment, or
matches the grammar for separator

If the longest such initial subsequence matches separator, this subsequence is discarded. Otherwise, a single non-terminal is produced from this initial subsequence. In either case, the tokenization proceeds with the rest of the character sequence. It is an error if there is no such initial subsequence.

The production rules below use some additional notation. Square brackets enclose a character class. A character class of the form [^chars] specifies any legal XML character that does not occur in chars. A legal XML character is a character that matches the Char production of [XML 1.0]. A character class of the form [chars], where chars does not being with ^, specifies any single character that occurs in chars. XML hexadecimal character references are used to denote a single character, as in XML. NCName is defined in [XML Namespaces].

identifier returns string  ::=
    NCName_x - keyword
        { x }
    |  "\"  NCName_x
        { x }

CName returns qname  ::=
    NCName_x  ":"  NCName_y
        { qName(x, y) }

nsName returns string  ::=
    NCName_x  ":*"
        { x }

literalSegment returns string  ::=
    """  stringNoQuot_x  """
        { x }
    |  "'"  stringNoApos_x  "'"
        { x }
    |  """""  stringNoTripleQuot_x  """""
        { x }
    |  "'''"  stringNoTripleApos_x  "'''"
        { x }

stringNoQuot returns string  ::=
    ε
        { "" }
    |  [^"]_x  stringNoQuot_y
        { stringConcat(x, y) }

stringNoApos returns string  ::=
    ε
        { "" }
    |  [^']_x  stringNoApos_y
        { stringConcat(x, y) }

stringNoTripleQuot returns string  ::=
    ε
        { "" }
    |  [^"]_x  stringNoTripleQuot_y
        { stringConcat(x, y) }
    |  """  [^"]_x  stringNoTripleQuot_y
        { stringConcat(""", x, y) }
    |  """"  [^"]_x  stringNoTripleQuot_y
        { stringConcat(""", x, y) }

stringNoTripleApos returns string  ::=
    ε
        { "" }
    |  [^']_x  stringNoTripleApos_y
        { stringConcat(x, y) }
    |  "'"  [^']_x  stringNoTripleApos_y
        { stringConcat("'", x, y) }
    |  "''"  [^']_x  stringNoTripleApos_y
        { stringConcat("'", x, y) }

documentation returns string  ::=
    documentationLine
    |  documentation_x  documentationContinuation_y
        { stringConcat(x, y) }

documentationLine returns string  ::=
    "##"  documentationLineContent_x
        { x }

documentationContinuation returns string  ::=
    [
]_x  indent  documentationLine_y
        { stringConcat(x, y) }

indent returns void  ::=
    ε
    |  [	 ]_x  indent

documentationLineContent returns string  ::=
    ε
        { "" }
    |  "#"  documentationLineContent_x
        { x }
    |  " "  restOfLine_x
        { x }
    |  [^
 #]_x  restOfLine_y
        { stringConcat(x, y) }

restOfLine returns string  ::=
    ε
        { "" }
    |  [^
]_x  restOfLine_y
        { stringConcat(x, y) }

separator returns void  ::=
    [	
 ]
    |  "#"  [^
#]  restOfLine
    |  "#"

A.2.6. Literal concatenation

In this stage, each maximal sequence of consecutive literalSegment tokens is concatenated into a literal token.

literal returns string  ::=
    literalSegment
    |  literalSegment_x  literal_y
        { stringConcat(x, y) }

B. Compact syntax RELAX NG schema for RELAX NG (Non-Normative)

# RELAX NG XML syntax specified in compact syntax.

default namespace rng = "http://relaxng.org/ns/structure/1.0"
namespace local = ""
datatypes xsd = "http://www.w3.org/2001/XMLSchema-datatypes"

start = pattern

pattern =
  element element { (nameQName | nameClass), (common & pattern+) }
  | element attribute { (nameQName | nameClass), (common & pattern?) }
  | element group|interleave|choice|optional
            |zeroOrMore|oneOrMore|list|mixed { common & pattern+ }
  | element ref|parentRef { nameNCName, common }
  | element empty|notAllowed|text { common }
  | element data { type, param*, (common & exceptPattern?) }
  | element value { commonAttributes, type?, xsd:string }
  | element externalRef { href, common }
  | element grammar { common & grammarContent* }

param = element param { commonAttributes, nameNCName, xsd:string }

exceptPattern = element except { common & pattern+ }

grammarContent = 
  definition
  | element div { common & grammarContent* }
  | element include { href, (common & includeContent*) }

includeContent =
  definition
  | element div { common & includeContent* }

definition =
  element start { combine?, (common & pattern+) }
  | element define { nameNCName, combine?, (common & pattern+) }

combine = attribute combine { "choice" | "interleave" }

nameClass = 
  element name { commonAttributes, xsd:QName }
  | element anyName { common & exceptNameClass? }
  | element nsName { common & exceptNameClass? }
  | element choice { common & nameClass+ }

exceptNameClass = element except { common & nameClass+ }

nameQName = attribute name { xsd:QName }
nameNCName = attribute name { xsd:NCName }
href = attribute href { xsd:anyURI }
type = attribute type { xsd:NCName }

common = commonAttributes, foreignElement*

commonAttributes = 
  attribute ns { xsd:string }?,
  attribute datatypeLibrary { xsd:anyURI }?,
  foreignAttribute*

foreignElement = element * - rng:* { (anyAttribute | text | anyElement)* }
foreignAttribute = attribute * - (rng:*|local:*) { text }
anyElement = element * { (anyAttribute | text | anyElement)* }
anyAttribute = attribute * { text }

References

Normative

Compatibility: James Clark, Makoto MURATA, editors. RELAX NG DTD Compatibility. OASIS, 2001.
RELAX NG: James Clark, Makoto MURATA, editors. RELAX NG Specification. OASIS, 2001.
Unicode: The Unicode Consortium. The Unicode Standard, Version 3.2 or later
XML 1.0: Tim Bray, Jean Paoli, and C. M. Sperberg-McQueen, Eve Maler, editors. Extensible Markup Language (XML) 1.0 Second Edition. W3C (World Wide Web Consortium), 2000.
XML Namespaces: Tim Bray, Dave Hollander, and Andrew Layman, editors. Namespaces in XML. W3C (World Wide Web Consortium), 1999.

Non-Normative

Guidelines: James Clark, Kohsuke KAWAGUCHI, editors. Guidelines for using W3C XML Schema Datatypes with RELAX NG. OASIS, 2001.
RFC 2046: N. Freed, N. Borenstein. RFC 2046: Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types. IETF (Internet Engineerig Task Force), 1996.
W3C XML Schema Datatypes: Paul V. Biron, Ashok Malhotra, editors. XML Schema Part 2: Datatypes. W3C (World Wide Web Consortium), 2001.
XDuce: Haruo Hosoya. Regular Expression Types for XML. PhD Thesis. The University of Tokyo, 2000.
XQuery Formal Semantics: Peter Fankhauser et al., editors.XQuery 1.0 Formal Semantics. W3C Working Draft 07 June 2001. W3C (World Wide Web Consortium), 2001.

RELAX NG Compact Syntax

Working Draft 4 November 2002

Abstract

Status of this Document

Table of Contents

Appendixes

1. Introduction

2. Syntax

3. Lexical structure

4. Declarations

5. Annotations

5.1. Initial annotations

5.2. Documentation shorthand

5.3. Following annotations

5.4. Grammar annotations

6. Conformance

6.1. Validator

6.2. Structure preserving translator

Note

6.3. Non-structure preserving translator

Note

A. Formal description

A.1. Syntax

Constraint: valid prefix

Constraint: xml prefix

Constraint: xml namespace URI

Constraint: xsd prefix

Constraint: datatypes URI

Constraint: duplicate declaration

Constraint: single element

Constraint: unqualified name

Constraint: xmlns namespace URI

Constraint: duplicate attributes

Constraint: annotation inherit

Constraint: any URI

A.2. Lexical structure

A.2.1. Character encoding

A.2.2. BOM stripping

A.2.3. Newline normalization

A.2.4. Escape interpretation

Note

A.2.5. Tokenization

A.2.6. Literal concatenation

B. Compact syntax RELAX NG schema for RELAX NG (Non-Normative)

References

Normative

Non-Normative