Implementing RELAX NG simplification

Simplifying grammar, define and ref

This section sketches how to handle the simplification of grammar, define and ref described in sections 4.18 and 4.19 of the RELAX NG specification.

Initial parse

The result of the initial parse of the full syntax is a Pattern object representing the schema; this differs from the simple syntax pattern in that it contains an additional kind of Pattern, namely a RefPattern. A RefPattern is a Pattern containing a reference to another Pattern.

During parsing maintain a current Grammar object. A Grammar object contains

When you see a <grammar> start-tag, create a new Grammar object that becomes the new current Grammar. Initially the map is empty, the start pattern is null, and the parent Grammar is the old current Grammar.

When you see a define element, look up the name in the current Grammar's map. If it doesn't exist, add a new RefPattern referring to the body of the definition. If it does exist, then check with the RefPattern's pattern reference is null; it is is null, then change it to refer to the body of the definition; if it's not null, then we have a duplicate definition.

Handle <start> similarly to <define>.

When you see a <ref> element, lookup the name in the current Grammar's map. If it doesn't exist, add a new RefPattern with a null pattern reference to the map. In either case, return that RefPattern as the result of parsing the <ref> element.

When you see a <parentRef> do the same as <ref> except use the parent Grammar's map.

At the <grammar> end-tag, check that the start of the current grammar is non-null. Also walk the map to check that all RefPatterns have non-null pattern references. Return the start pattern as the result of parsing the <grammar> element.

Checking for illegal recursion

The next task is to check that there is no illegal recursion. For this it is convenient to add a checkRecursionDepth field to the PatternRef object. This should be initialized to -1. Define a checkRecursion(int depth) method on Patterns. By default this simply calls checkRecursion with the same argument on all subpatterns. For an ElementPattern it calls checkRecursion(depth + 1). For a PatternRef pattern it does this:

checkRecursion(int depth) {
  if (checkRecursionDepth == -1) {
    checkRecursionDepth = depth;
    referencedPattern.checkRecursion(depth);
    checkRecursionDepth = -2;
  }
  else if (depth == checkRecursionDepth)
    error("illegal recursion");
}

To check for illegal recursions, call checkRecursion(0) on the pattern for the schema as a whole.

Expanding references

After determining that there is no illegal recursion, the next stage is to eliminate RefPatterns. Do this with an expand() method on Patterns. It is convenient to add a boolean expanded field to ElementPattern; this is initialized to false. For an ElementPattern the code for expand is

Pattern expand() {
  if (!expanded) {
    expanded = true;
    content = expand(content);
  }
  return this;
}

For a RefPattern it would be simply

Pattern expand() {
  return referencedPattern.expand();
}

For a Text/DataPattern etc it would be simply

Pattern expand() {
  return this;
}

For a Choice it would be

Pattern expand() {
  return makeChoice(operand1.expand(), operand2.expand());
}

Handling include and combine

See the following message from Kohsuke Kawaguchi.

James Clark