Project

Profile

Help

How to connect?
Download (158 KB) Statistics
| Branch: | Revision:

he / src / userdoc / sourcedocs.xml @ beaf15bc

1
<?xml version="1.0" encoding="utf-8"?>
2
<article id="sourcedocs" title="Handling XML Documents">
3
  <h1>Handling XML Documents</h1>
4

    
5

    
6
  <p>This section discusses the various options in Saxon for handling XML documents.
7
    These might form the input or output of a query or stylesheet, or they might be
8
  used directly by application code written (say) in Java.</p>
9

    
10
  <p>See the topics below for further information:</p>
11

    
12
  <nav>
13
    <ul/>
14
  </nav>
15

    
16
  <section id="command-line" title="Source Documents on the Command Line">
17
    <h1>Source Documents on the Command Line</h1>
18

    
19

    
20
    <p>When Saxon (either XSLT or XQuery) is invoked from the command line, the source document will
21
      normally be an XML 1.0 document. Supplying an XML 1.1 document will also work, provided that
22
      (a) the selected parser is an XML 1.1 parser, and (b) the command line option
23
        <code>-xmlversion:1.1</code> is set.</p>
24

    
25
    <p>If a custom parser is specified using the <code>-x</code> option on the command line, then
26
      the source document can be in any format accepted by this custom parser. The only constraint
27
      is that the parser must behave as a SAX2 parser, delivering a stream of events that define a
28
      virtual XML document. For example, the TagSoup parser from John Cowan can be used to feed an
29
      HTML document as input to Saxon.</p>
30

    
31
    <p>Non-standard input formats can also be handled by specifying a user-written
32
        <code>URIResolver</code>. If the <code>-u</code> option is used on the command line, or if
33
      the source file name begins with <code>http:</code> or <code>https:</code> or
34
        <code>file:</code> or <code>classpath:</code>, then the source file name is resolved to a
35
      JAXP Source object using the <code>URIResolver</code>; if a user-written
36
        <code>URIResolver</code> is nominated (using the <code>-r</code> option) then this may
37
      translate the file name into a <code>Source</code> object any way that it wishes.</p>
38

    
39
    <aside>Saxon (from 9.7) supports the <code>classpath</code> URI scheme to locate resources
40
      using the Java classpath. This URI scheme is defined by the Spring framework, but Saxon's
41
      implementation is free-standing. For example, <code>classpath:utility.xsl</code> will locate
42
      a file called <code>utility.xsl</code> as a resource on the classpath.</aside>
43
    <aside>Saxon (from 9.9) also supports the <code>data</code> URI scheme, which allows
44
      a small resource to be contained within the URI itself, suitably encoded.</aside>
45
  </section>
46
  <section id="collections" title="Collections">
47
    <h1>Collections</h1>
48

    
49
    <p>Saxon implements the <a class="bodylink code" href="/functions/fn/collection"
50
        >collection()</a> and <a class="bodylink code" href="/functions/fn/uri-collection"
51
        >uri-collection()</a> functions by passing the given collection URI (or null, if the default
52
      collection is requested) to a user-provided <a class="javalink"
53
        href="net.sf.saxon.lib.CollectionFinder">CollectionFinder</a>. This section describes how
54
      the standard (default) collection finder behaves, if no user-written collection finder is
55
      supplied. (For information on supplying a user-written <code>CollectionFinder</code>, see <a
56
        class="bodylink" href="user-collections">Writing your own Collection Finder</a>.)</p>
57
    
58
    <p>In XSLT 3.0 and XQuery 3.1, collections can contain resources other than XML documents: for
59
    example, JSON documents, plain text documents, and binary files.</p>
60

    
61
    <p>The default collection can be registered with the <code>Configuration</code> in the form of a
62
      collection URI. When the <code>collection()</code> function is called with no arguments, this
63
      is exactly the same as supplying this default collection URI. If no default collection URI has
64
      been registered, an empty collection is returned.</p>
65

    
66
    <p>The standard collection finder supports four different kinds of collection: registered collections,
67
      catalog-based collections, directory-based collections, and zip-based collections:</p>
68
    
69
    <ul>
70
      <li><p>A registered collection is one that has been explicitly registered with the Configuration, by calling
71
      <code>Configuration.registerCollection()</code>.</p></li>
72
      <li><p>If the collection URI
73
        corresponds to a directory name, then a directory-based collection is used: the collection contains
74
      selected files from the named directory.</p></li>
75
      <li><p>If the collection URI identifies a
76
        ZIP or JAR file (more specifically, if it uses the <code>jar</code> URI scheme, or has a file extension of
77
        ".zip" or ".jar") then a zip-based collection is used.</p></li>
78
      <li><p>Otherwise, the collection URI must be
79
        the URI of an XML file which acts as a catalog, that is, it contains a list of the resources
80
        in the collection.</p></li>
81
    </ul>
82

    
83

    
84
    <aside>
85
      <p>To recognize additional kinds of ZIP file, for example Open Office documents, set the
86
      configuration property <code>ZIP_URI_PATTERN</code>. The value is a regular
87
        expression, for example you could set it to <code>"\.(zip|jar|docx)$"</code> to recognize
88
        URIs with file extensions ".zip", ".jar", or ".docx".</p>
89
    </aside>
90

    
91

    
92
    <p>Saxon by default recognizes four kids of resource: XML documents,
93
      JSON documents, unparsed text documents, and binary files. The standard collection resolver
94
      attempts to identify which kind of resource to use based on the content type (media type),
95
      which in turn may be inferred from HTTP headers, from sniffing the initial bytes of the
96
      content, or from file extensions.</p>
97

    
98
    <p>In the case of directory-based and ZIP-based collections, query parameters may be added to
99
      the collection URI to further control how it is to be processed.</p>
100
    
101
    <aside><p>Saxon cannot assume that the nodes returned by the <code>collection()</code> function
102
    are in document order. It is therefore best to avoid expressions like <code>collection()/doc/section</code>
103
    which force the collection to be sorted (and therefore force all the nodes in the collection to
104
    be in memory at the same time). To iterate over a collection, it's better to use constructs that
105
    don't sort into document order: for example <code>collection() ! doc/section</code>,
106
    or <code>xsl:for-each</code>, or <code>for $x in collection() return ...</code>.</p>
107
    
108
      <p>See also <a class="bodylink code"
109
        href="/functions/saxon/discard-document">saxon:discard-document()</a>.</p></aside>
110

    
111
    <h2 class="subtitle">Defining a collection using a catalog file</h2>
112

    
113
    <p>If the collection URI identifies a file, Saxon treats this as a catalog file. This is a file
114
      in XML format that lists the documents comprising the collection. Here is an example of such a
115
      catalog file:</p>
116
    <samp><![CDATA[<collection stable="true">
117
  <doc href="dir/chap1.xml"/>
118
  <doc href="dir/chap2.xml"/>
119
  <doc href="dir/chap3.xml"/>
120
  <doc href="dir/chap4.xml"/>
121
</collection>]]></samp>
122

    
123
    <p>The <code>stable</code> attribute indicates whether the collection is stable or not. The
124
      default value is <code>true</code>. If a collection is stable, then the URIs listed in the
125
        <code>doc</code> elements are treated like URIs passed to the <code>doc()</code> function.
126
      Each URI is first looked up in the document pool to see if it is already loaded; if it is,
127
      then the document node is returned. Otherwise the URI is passed to the registered
128
        <code>URIResolver</code>, and the resulting document is added to the document pool. The
129
      effect of this process is firstly, that two calls on the <code>collection()</code> function
130
      passing the same collection URI will return the same nodes each time, and secondly, that these
131
      results are consistent with the results of the <code>doc()</code> function: if the
132
        <code>document-uri()</code> of a node returned by the <code>collection()</code> function is
133
      passed to the <code>doc()</code> function, the original node will be returned. If
134
        <code>stable="false"</code> is specified, however, the URI is dereferenced directly, and the
135
      document is not added to the document pool, which means that a subsequent retrieval of the
136
      same document will not return the same node.</p>
137

    
138
    <h2 class="subtitle">Processing directories</h2>
139

    
140
    <p>If the URI passed to the <code>collection()</code> function (still assuming a default
141
        <code>CollectionFinder</code>) identifies a directory, then the contents of the
142
      directory are returned. Such a URI may have a number of query parameters, written in the form
143
        <code>file:///a/b/c/d?keyword=value;keyword=value;...</code>. The recognized keywords and
144
      their values are as follows:</p>
145
    <table>
146
      <thead class="params">
147
        <tr>
148
          <td>
149
            <p> keyword </p>
150
          </td>
151
          <td>
152
            <p> values </p>
153
          </td>
154
          <td>
155
            <p> effect </p>
156
          </td>
157
        </tr>
158
      </thead>
159
      <tbody>
160
        <tr>
161
          <td class="keyword">
162
            <p> recurse </p>
163
          </td>
164
          <td>
165
            <p>
166
              <span class="value">yes | no</span> (default <span class="value">no</span>) </p>
167
          </td>
168
          <td>
169
            <p> Determines whether subdirectories are searched recursively. </p>
170
          </td>
171
        </tr>
172
        <tr>
173
          <td class="keyword">
174
            <p> strip-space </p>
175
          </td>
176
          <td>
177
            <p class="value"> yes | ignorable | no </p>
178
          </td>
179
          <td>
180
            <p> Determines whether whitespace text nodes are to be stripped. The default depends on
181
              the <a class="javalink" href="net.sf.saxon.Configuration">Configuration</a> settings.
182
            </p>
183
          </td>
184
        </tr>
185
        <tr>
186
          <td class="keyword">
187
            <p> validation </p>
188
          </td>
189
          <td>
190
            <p class="value"> strip | preserve | lax | strict </p>
191
          </td>
192
          <td>
193
            <p> Determines whether and how schema validation is applied to each document. The
194
              default depends on the <a class="javalink" href="net.sf.saxon.Configuration"
195
                >Configuration</a> settings. </p>
196
          </td>
197
        </tr>
198
        <tr>
199
          <td class="keyword">
200
            <p> select </p>
201
          </td>
202
          <td>
203
            <p> file name pattern ("glob")</p>
204
          </td>
205
          <td>
206
            <p> Determines which files are selected (see below). </p>
207
          </td>
208
        </tr>
209
        <tr>
210
          <td class="keyword">
211
            <p> match </p>
212
          </td>
213
          <td>
214
            <p> regular expression</p>
215
          </td>
216
          <td>
217
            <p> Determines which files are selected (see below). </p>
218
          </td>
219
        </tr>
220
        <tr>
221
          <td class="keyword">
222
            <p> content-type </p>
223
          </td>
224
          <td>
225
            <p> media type (for example <code>application/xml</code> or <code>text/plain</code>)</p>
226
          </td>
227
          <td>
228
            <p> Determines how the resource is processed. For example if the media type is 
229
            <code>application/xml</code> then it will be parsed as XML and returned as a document node;
230
            if it is <code>text/plain</code> then it is returned as an atomic value of type
231
            <code>xs:string</code>; if it is <code>application/binary</code> then it is returned
232
            as an atomic value of type <code>xs:base64Binary</code>.</p>
233
            <p>If this parameter is absent, then the <code
234
              java="net.sf.saxon.lib.CollectionFinder">CollectionFinder</code> attempts to discern the
235
            content type first by looking at the file extension, and then, if necessary, by
236
            examining the initial bytes of the content itself.</p>
237
            <p>The set of content types that are recognized, and their mapping to implementations of the
238
            class <code java="net.sf.saxon.lib.ResourceFactory">ResourceFactory</code>, is defined in the 
239
            <code java="net.sf.saxon.Configuration">Configuration</code>, and can be changed using the
240
            method <code>Configuration.registerMediaType()</code>. The set of file extensions that are
241
              recognized, and their mapping to media types, is also held in the <code>Configuration</code>, and can be changed using the
242
              method <code>Configuration.registerFileExtension()</code>.</p>
243
            <p>Available from Saxon 10.1.</p>
244
          </td>
245
        </tr>
246
        <tr>
247
          <td class="keyword">
248
            <p> metadata </p>
249
          </td>
250
          <td>
251
            <p class="value"> yes | no</p>
252
          </td>
253
          <td>
254
            <p> If set to yes, the item returned by the <code>collection()</code> function will be a
255
              map containing properties of the selected resource as well as its content. The keys of
256
              the map will be strings. Two entries with names "name" and "fetch" will always be
257
              available.</p>
258
            <p>The value of the "fetch" entry is a function that can be called to retrieve the
259
              content (it returns the same item that would have been returned with the default
260
              setting of <code>metadata=no</code>: for example a node representing an XML document,
261
              or a map representing the content of a JSON file). This allows you to decide which
262
              items in the collection to fetch based on their properties, for example:</p>
263

    
264
            <p>
265
              <code>for $m in collection('/data/folder?metadata=yes') return if
266
                ($m?content-type='application/xml') then $m?fetch() else ()</code>
267
            </p>
268

    
269
            <p>Failures in parsing a resource can be trapped by using try/catch around the call on
270
              the <code>fetch</code> function.</p>
271
            <p>Other entries in the returned map represent properties of the file obtained from the
272
              operating system: for example <code>last-modified</code>, <code>can-execute</code>,
273
                <code>length</code>, or <code>is-hidden</code>.</p>
274
          </td>
275
        </tr>
276
        <tr>
277
          <td class="keyword">
278
            <p> on-error </p>
279
          </td>
280
          <td>
281
            <p class="value"> fail | warning | ignore </p>
282
          </td>
283
          <td>
284
            <p> Determines the action to be taken if one of the files cannot be successfully parsed.
285
            </p>
286
          </td>
287
        </tr>
288
        <tr>
289
          <td class="keyword">
290
            <p> parser </p>
291
          </td>
292
          <td>
293
            <p> Java class name </p>
294
          </td>
295
          <td>
296
            <p> Class name of the Java <code>XMLReader</code> to be used. For example, John Cowan's
297
                <code>TagSoup</code> parser may be selected by specifying
298
                <code>parser=org.ccil.cowan.tagsoup.Parser</code> (this parses arbitrary ill-formed
299
              HTML and presents it to Saxon as well-formed XML). </p>
300
          </td>
301
        </tr>
302
        <tr>
303
          <td class="keyword">
304
            <p> xinclude </p>
305
          </td>
306
          <td>
307
            <p class="value"> yes | no </p>
308
          </td>
309
          <td>
310
            <p> Determines whether XInclude processing should be applied to the selected documents.
311
              This overrides any setting in the <a class="javalink"
312
                href="net.sf.saxon.Configuration">Configuration</a> (or any command line option).
313
            </p>
314
          </td>
315
        </tr>
316
        <tr>
317
          <td class="keyword">
318
            <p> stable </p>
319
          </td>
320
          <td>
321
            <p class="value"> yes | no </p>
322
          </td>
323
          <td>
324
            <p> Determines whether the collection is to be stable. </p>
325
          </td>
326
        </tr>
327

    
328
      </tbody>
329
    </table>
330

    
331
    <p>The pattern used in the <code>select</code> parameter can use glob-like syntax, for example
332
        <code>*.xml</code> selects all files with extension "xml". More generally, the pattern is
333
      converted to a regular expression by prepending "<code>^</code>", appending "<code>$</code>",
334
      replacing "<code>.</code>" by "<code>\.</code>", "<code>*</code>" by
335
      "<code>.*</code>", and "<code>?</code>" by
336
      "<code>.?</code>", and it is then used to match the file names appearing in the directory
337
      using the Java regular expression rules. So, for example, you can write
338
        <code>?select=*.(xml|xhtml)</code> to match files with either of these two file extensions.
339
      Note however, that special characters used in the URL (that is, characters such as backslash 
340
      and curly braces that are not allowed in the query part of a URI) must be escaped using 
341
      the %HH convention. For example,
342
      vertical bar needs to be written as <code>%7C</code>. This escaping can be achieved using the
343
        <code>encode-for-uri()</code> function.</p>
344
    
345
    <p>As an alternative to the <code>select</code> parameter, the <code>match</code> parameter
346
    can be used. This accepts a standard XPath 3.1 regular expression as its value. For example,
347
    <code>.+\.xml</code> selects all files with extension "xml". Again, characters that are not allowed
348
    in the query part of a URI, such as backslash, curly braces, and vertical bar, must be escaped
349
    using the %HH convention, which can be achieved using the encode-for-uri() function.</p>
350

    
351
    <p> A collection read in this way is not stable by default. (Stability can be expensive, and is
352
      rarely required, so the default setting is recommended.) Making a collection stable has the
353
      effect that the entire result of the <code>collection()</code> function is retained in a cache
354
      for the duration of the query or transformation, and any further calls on
355
        <code>collection()</code> with the same absolute URI return this saved collection retrieved
356
      from this cache. </p>
357

    
358
    <h2 class="subtitle">Processing ZIP and JAR files</h2>
359

    
360
    <p>If the collection URI identifies a ZIP or JAR file then it is processed in exactly the same
361
      way as a directory. URI query parameters can be used in the same way, and have much the same
362
      effect.</p>
363

    
364
    <p>A URI is recognized as a ZIP or JAR file URI if the scheme name is "jar", or if the file
365
      extension is "zip" or "jar".</p>
366

    
367
    <p>The value of the <code>recurse</code> option is ignored in this case, and
368
        <code>recurse=yes</code> is assumed.</p>
369

    
370
    <p>The option <code>metadata=yes</code> is available for ZIP-based collections as well as for
371
      directory-based collections. The set of properties returned in the resulting map is slightly
372
      different, for example it includes any <code>comment</code> field associated with the ZIP file
373
      entry. Note that no items are returned in respect of directory nodes within the ZIP file; only
374
      leaf nodes are represented.</p>
375
    
376
    <h2 class="subtitle">Registered Collections</h2>
377
    
378
    <p>On the .NET product there is another way to use a collection URI (provided that you use the
379
      API rather than the command line): you can register a collection using the
380
      <code>Processor.RegisterCollection</code> method on the <a class="javalink"
381
        href="Saxon.Api.Processor">Saxon.Api.Processor</a> class.</p>
382
    
383
    <section id="user-collections" title="Writing your own Collection Finder">
384
      <h1>Writing your own Collection Finder</h1>
385
      
386
      <p>Since Saxon 9.7, the <a class="javalink" href="net.sf.saxon.lib.CollectionFinder">CollectionFinder</a>
387
        interface replaces the <code>CollectionURIResolver</code> interface in previous
388
        releases. It has much more flexibility, in particular the ability to deliver non-XML
389
        resources. The old <code>CollectionURIResolver</code> interface has been dropped in Saxon 10.</p>
390
      
391
      <p>Details of the interface can be found in the Javadoc. The basic steps are:</p>
392
      
393
      <ol>
394
        <li>
395
          <p>Write a class that implements <code>CollectionFinder</code>. It takes a single method,
396
            which accepts an absolute collection URI, and returns an object that implements
397
            <code>ResourceCollection</code>. Register an instance of your
398
            <code>CollectionFinder</code> with the Saxon <code>Configuration</code>.</p>
399
          <p>For example, a <code>CollectionFinder</code> written to handle collection URIs using the
400
            scheme name "sql" might be supplied as:</p>
401
          <samp><![CDATA[config.setCollectionFinder((context, uri) -> 
402
   uri.startsWith('sql:') 
403
      ? sqlCollection(uri) 
404
      : config.getStandardCollectionFinder().findCollection(context, uri)
405
)]]></samp>
406
          <p>where <code>sqlCollection(uri)</code> returns some user-defined implementation
407
            of <code>ResourceCollection</code>, perhaps one that retrieves XML documents from
408
            a relational database.</p>
409
        </li>
410
        <li>
411
          <p>You can either reuse the existing implementations of <a class="javalink"
412
            href="net.sf.saxon.lib.ResourceCollection">ResourceCollection</a>, namely
413
            <code>CatalogCollection</code>, <code>DirectoryCollection</code>, and
414
            <code>JarCollection</code>, or you can write your own. You can also of course subclass
415
            the existing collection classes. The <code>ResourceCollection</code> object provides two
416
            key methods that you need to implement: <code>getResources()</code>, which returns a
417
            sequence of <code>Resource</code> objects, and <code>getResourceURIs()</code>, which
418
            returns a sequence of URIs. These are invoked by the <a class="bodylink code"
419
              href="/functions/fn/collection" >fn:collection()</a> and <a class="bodylink code"
420
                href="/functions/fn/uri-collection" >fn:uri-collection()</a> functions respectively.</p>
421
        </li>
422
        <li>
423
          <p>Again, you can either reuse existing implementations of <a class="javalink"
424
            href="net.sf.saxon.lib.Resource">Resource</a> (such as <code>XmlResource</code>,
425
            <code>JSONResource</code>, <code>UnparsedTextResource</code>,
426
            <code>BinaryResource</code>, and <code>MetadataResource</code>), or you can create your
427
            own, perhaps by subclassing. The key method that the <code>Resource</code> object must
428
            provide is <code>getItem()</code> which returns the resource in the form of an XDM item.
429
            It is good practice to delay any extensive work such as parsing until the
430
            <code>getItem()</code> method is called: this reduces the memory footprint, and enables
431
            parallel evaluation of multiple threads (Saxon-EE only).</p>
432
        </li>
433
      </ol>
434
    </section>
435

    
436
  </section>
437
  <section id="builder-api" title="Building a Source Document from lexical XML">
438
    <h1>Building a Source Document from lexical XML</h1>
439

    
440
    <p>The conversion of lexical XML to a tree in memory is called <i>parsing</i>, and is performed
441
    by a software component called an <i>XML Parser</i>. Saxon does not include its own XML parser,
442
    rather it provides interfaces that invoke XML parsers supplied by third parties. Platforms
443
    such as Java and .NET typically include a built-in XML parser that Saxon uses by default.</p>
444

    
445
    <p>With the Java s9api interface, a source document can be built using the <a class="javalink"
446
        href="net.sf.saxon.s9api.DocumentBuilder">DocumentBuilder</a> class, which is created using
447
      the factory method <code>newDocumentBuilder</code> on the <a class="javalink"
448
        href="net.sf.saxon.s9api.Processor">Processor</a> object. Various options for document
449
      building are available as methods on the <code>DocumentBuilder</code>, for example options to
450
      perform schema or DTD validation, to strip whitespace, to expand XInclude directives, and also
451
      to choose the tree implementation model to be used.</p>
452
    
453
    <p>These methods create a document from a <code>Source</code> object. This is a JAXP interface designed
454
    as an abstraction of various kinds of XML source, including <code>StreamSource</code>, which represents lexical XML
455
    held in a file or input stream; <code>SAXSource</code>, which represents a source of SAX events; <code>DOMSource</code>,
456
    representing an already-parsed XML document held in a DOM tree; and <code>StAXSource</code>, which represents a
457
      class that responds to requests for STAX (pull-parser) events. In addition, Saxon's <code
458
        java="net.sf.saxon.om.NodeInfo">NodeInfo</code> and <code
459
          java="net.sf.saxon.om.TreeInfo">TreeInfo</code> classes
460
      implements the JAXP <code>Source</code> interface, and the s9api <a class="javalink"
461
        href="net.sf.saxon.s9api.XdmNode">XdmNode</a> class has an <code>asSource()</code> method,
462
      so it is always possible to supply an existing Saxon tree as
463
    the source for any of these interfaces.</p>
464

    
465
    <p>Similarly in the .NET API, there is a <a class="javalink" href="Saxon.Api.DocumentBuilder"
466
        >DocumentBuilder</a> object that can be created from the <a class="javalink"
467
        href="Saxon.Api.Processor">Processor</a>. This allows options to be set controlling the way
468
      documents are built, and provides an overloaded <code>Build</code> method allowing a tree to
469
      be built from various kinds of source.</p>
470

    
471
    <p>It is also possible to build a Saxon tree in memory by using the <code>buildDocumentTree()</code>
472
      method of the <a class="javalink" href="net.sf.saxon.Configuration">Configuration</a> object.
473
      (When using the JAXP Transformation API, the <code>Configuration</code> can be obtained from
474
      the <code>TransformerFactory</code> as the value of the attribute named <a class="javalink"
475
        href="net.sf.saxon.lib.Feature#CONFIGURATION">Feature.CONFIGURATION.name</a>.)</p>
476

    
477
    <p>The <a class="javalink" href="net.sf.saxon.Configuration#buildDocumentTree">buildDocumentTree()</a>
478
      method takes a single argument, a JAXP <code>Source</code>. This can be any of the standard
479
      kinds of JAXP <code>Source</code>. See <a class="bodylink" href="../jaxpsources">JAXP
480
        Sources</a> for more information. The method returns a <code
481
          java="net.sf.saxon.om.TreeInfo">TreeInfo</code> containing information about the constructed tree,
482
      notably the method <code>getRootNode()</code> to get the root node of the tree,
483
      which in most cases will be a document node.
484
    </p>
485

    
486
    <p>All the documents processed in a single transformation or query must be loaded using the same
487
        <a class="javalink" href="net.sf.saxon.Configuration">Configuration</a>. However, it is
488
      possible to copy a document from one <code>Configuration</code> into another by supplying the
489
        <a class="javalink" href="net.sf.saxon.om.TreeInfo">TreeInfo</a> at the root of the
490
      existing document as the <code>Source</code> supplied to the <code>buildDocumentTree()</code>
491
      method of the new <code>Configuration</code>. </p>
492
  </section>
493
  <section id="building-programmatically" title="Building XML Trees Programmatically">
494
    <h1>Building XML Trees Programmatically</h1>
495
    <p>There are various ways in Saxon to build an XDM tree programmatically 
496
      (that is, incrementally one node at a time).</p>
497
    
498
    <h2 class="subtitle">The Sapling Tree API</h2>
499
    <p>A new API offered from Saxon 10 is the Sapling Tree API. This provides a collection of methods to create
500
    nodes; for example, to create a document containing a <code>body</code> element with two paragraphs, the expression</p>
501
    <samp><![CDATA[doc(
502
  elem("body")
503
    .child(elem("p").text("Hello"), 
504
           elem("p").text("World"))
505
      )]]></samp>
506
    <p>might be used. These methods are found in package <code>net.sf.saxon.sapling</code>, specifically in the
507
      class <code java="net.sf.saxon.sapling.Saplings">net.sf.saxon.sapling.Saplings</code>.</p>
508
    <p>The "Sapling" nodes created by these methods are transient nodes used only during tree construction; when the Sapling
509
    tree has been completely built, it can be converted to a regular XDM tree offering full query access using the methods
510
      <code java="net.sf.saxon.sapling.SaplingDocument#toXdmNode">SaplingDocument.toXdmNode()</code>
511
      or <code  java="net.sf.saxon.sapling.SaplingDocument#toNodeInfo">SaplingDocument.toNodeInfo()</code>. It is also possible to send the tree
512
      directly to a <code java="net.sf.saxon.s9api.Destination">Destination</code> such as a 
513
      <code java="net.sf.saxon.s9api.Serializer">Serializer</code>, a 
514
      <code java="net.sf.saxon.s9api.SchemaValidator">SchemaValidator</code>, or an 
515
      <code java="net.sf.saxon.s9api.Xslt30Transformer">Xslt30Transformer</code>.</p>
516
    
517
    <p>Sapling nodes are immutable objects, so operations like adding children or adding attributes always create a new object,
518
    without modifying the input objects. This means that adding a child element to a new parent can be done without an expensive
519
    copy operation. Nodes do not have references to their parents in the tree, so a subtree can be shared by multiple trees
520
    without copying.</p>
521
    
522
    <p>The Sapling Tree API is described in the JavaDoc for class <code java="net.sf.saxon.sapling.SaplingNode">SaplingNode</code>.</p>
523
    
524
    <h2 class="subtitle">Event APIs</h2>
525
    <p>Saxon 10 introduces a new event-based API (called simply "Push") designed explicitly for convenient use by 
526
      user-written applications.</p>
527
    
528
    <p>A <code>Push</code> instance is always created using the factory method <code>Processor.newPush(destination)</code>;
529
      the <code>destination</code> argument indicates what happens to the constructed document. 
530
      This will commonly be an <code>XdmDestination</code> to build an in-memory <code>XdmNode</code>,
531
      or a <code>Serializer</code> to create lexical XML,
532
      but it could also be, for example, an <code>XsltTransformer</code> or a <code>SchemaValidator</code>.</p>
533
    
534
    <p>Conventional event-based APIs such as the SAX <code>ContentHandler</code> and StAX <code>XMLStreamWriter</code>
535
    and <code>XMLEventWriter</code> rely on the application to issue a properly-nested
536
    sequence of calls to methods such as <code>startElement()</code> and <code>endElement()</code>. This can make
537
      it very difficult to diagnose errors if the calls are not properly matched. The Saxon <code
538
        java="net.sf.saxon.s9api.Push">Push</code> API differs in that
539
    a call to start a new element node returns an <code>Element</code> object representing that element, and methods to create attributes
540
      and children for the element, and to end the element, are defined as methods on that <code>Element</code> object.
541
      Furthermore, these methods return the element to which they are applied, allowing method chaining.
542
    So a typical sequence of calls might be:</p>
543
    
544
    <samp><![CDATA[   out.element("employee")
545
      .attribute("ssn", "123456")
546
      .attribute("location", "Berlin")
547
      .text("Helmut Schmidt")
548
      .close();
549
]]></samp>
550
    
551
    <p>This example constructs a slightly more complex tree:</p>
552
    
553
    <samp><![CDATA[   Processor processor = new Processor(false);
554
   Serializer destination = processor.newSerializer(new File("out.xml"));
555
   destination.setOutputProperty(Serializer.Property.INDENT, "no");
556
   Push.Document doc = processor.newPush(destination).document(true);
557
   doc.setDefaultNamespace("http://www.example.org/ns");
558
   Push.Element top = doc.element("root");
559
   top.attribute("version", "1.5");
560
   for (Employee emp : getData()) {
561
      top.element("emp")
562
         .attribute("ssn", emp.ssn)
563
         .text(emp.name);
564
   }
565
   doc.close(); 
566
]]></samp>
567
    
568
    <p>Note that there are no explicit <code>endElement</code> events here; an end tag is written automatically when
569
    the next sibling is written to the parent element, or when the parent element is closed. The <code>close()</code>
570
    method is available, however, to close an element explicitly, which can be useful to avoid errors when the writing
571
    of elements is distributed across many classes and methods.</p>
572
    
573
    <p>Saxon also allows trees to be communicated using other event-based APIs. In Java there are three such APIs worth considering:</p>
574
    <ul>
575
      <li>Saxon's <code>Receiver</code> API</li>
576
      <li>The SAX <code>ContentHandler</code> API</li>
577
      <li>The StAX <code>XMLStreamWriter</code> API</li>
578
    </ul>
579
    <p>The <code java="net.sf.saxon.event.Receiver">Receiver</code> is efficient, but it is proprietary to Saxon, is prone to minor changes from one release to another,
580
    and is designed primarily for internal use rather than for direct use from applications.</p>
581
    <p>The SAX <code>ContentHandler</code> API was designed primarily for communication from an XML parser to an application; it can be
582
    clumsy to use when the originator of events is something other than an XML parser.</p>
583
    <p>The StAX <code>XMLStreamWriter</code> is probably the best of the three interfaces for most
584
      applications. Saxon's <code java="net.sf.saxon.s9api.DocumentBuilder">DocumentBuilder</code> class
585
      offers a method <code java="net.sf.saxon.s9api.DocumentBuilder#newBuildingStreamWriter">newBuildingStreamWriter()</code> which returns an <code>XMLStreamWriter</code>; the calling application can
586
    then use methods such as <code>XMLStreamWriter.writeStartElement()</code> and <code>XmlStreamWriter.writeEndElement()</code>
587
    to build the tree.</p>
588
    <p>The trickiest part of this interface is probably the handling of namespaces. Saxon's implementation of the StAX interfaces takes
589
    into account not only the official Javadoc specifications (which in some respects are woefully inadequate), but also the unofficial
590
    interpretation of the specifications found at <a
591
      href="http://veithen.github.io/2009/11/01/understanding-stax.html" class="bodylink">Understanding StAX:
592
    How to Correctly Use XMLStreamWriter</a>.</p>
593
  </section>
594
  <section id="preloading" title="Preloading shared reference documents">
595
    <h1>Preloading shared reference documents</h1>
596
    <p>An option is available (<a class="bodylink code" href="/configuration/config-features"
597
        >Feature.PRE_EVALUATE_DOC_FUNCTION</a>) to indicate that calls to the <code>doc()</code>
598
      or <code>document()</code> functions with constant string arguments should be evaluated when a
599
      query or stylesheet is compiled, rather than at run-time. This option is intended for use when
600
      a reference or lookup document is used by all queries and transformations. Using this option
601
      has a number of effects:</p>
602
    <ol>
603
      <li>
604
        <p>The URI is resolved using the compile-time <code>URIResolver</code> rather than the
605
          run-time <code>URIResolver</code>.</p>
606
      </li>
607
      <li>
608
        <p>The document is loaded into a document pool held by the <a class="javalink"
609
            href="net.sf.saxon.Configuration">Configuration</a>, whose memory is released only when
610
          the <code>Configuration</code> itself ceases to exist.</p>
611
      </li>
612
      <li>
613
        <p>All queries and transformations using this document share the same copy.</p>
614
      </li>
615
      <li>
616
        <p>Any updates to the document that occur between compile-time and run-time have no
617
          effect.</p>
618
      </li>
619
    </ol>
620
    <p>The option is selected by using <code>Configuration.setConfigurationProperty()</code> or
621
        <code>TransformerFactory.setAttribute()</code> with the property name
622
        <code>Feature.PRE_EVALUATE_DOC_FUNCTION.name</code>. This option is not available from the
623
      command line because it has no useful effect with a single-shot compile-and-run interface.</p>
624
    <p>This option has no effect if the URI supplied to the <code>doc()</code> or
625
        <code>document()</code> function includes a fragment identifier.</p>
626
    <p>It is also possible to preload a specific document into the shared document pool from the
627
      Java application by using the call <code>config.getGlobalDocumentPool().add(doc, uri)</code>.
628
      When the <code>doc()</code> or <code>document()</code> function is called, the shared document
629
      pool is first checked to see if the requested document is already present. The <a
630
        class="javalink" href="net.sf.saxon.om.DocumentPool">DocumentPool</a> object also has a
631
        <code>discard()</code> method which causes the document to be released from the pool.</p>
632
    
633
    <aside>It is not advisable to use this option when a compiled stylesheet is exported to a SEF
634
    file. Data files are best deployed separately, rather than by embedding them in the SEF.</aside>
635
  </section>
636
  <section id="xml-catalogs" title="Using XML Catalogs">
637
    <h1>Using XML Catalogs</h1>
638

    
639

    
640
    <p>XML Catalogs (<a
641
        href="http://xml.apache.org/commons/components/resolver/resolver-article.html"
642
        class="bodylink">defined by OASIS</a>) provide a way to avoid hard-coding the locations of
643
      XML documents and other resources in your application. Instead, the application refers to the
644
      resource using a conventional system identifier (URI) or public identifier, and a local
645
      catalog is used to map the system and public identifiers to an actual location.</p>
646

    
647
    <p>When using Saxon from the command line, it is possible to specify a catalog to be used using
648
      the option <code>-catalog:<i>files</i></code>. Here <code><i>files</i></code> is the catalog
649
      file to be searched, or a list of filenames separated by semicolons. This catalog will be used
650
      to locate DTDs and external entities required by the XML parser, XSLT stylesheet modules
651
      requested using <code>xsl:import</code> and <code>xsl:include</code>, documents requested
652
      using the <code>document()</code> and <code>doc()</code> functions, and also schema documents,
653
      however they are referenced.</p>
654

    
655
    <p>
656
      <i>The catalog is NOT currently used for non-XML resources, including JSON documents, 
657
        query modules, unparsed text files, collations, and collections.</i>
658
    </p>
659

    
660
    <p>With Saxon on the Java platform, if the <code>-catalog</code> option is used on the command
661
      line, then the open-source Apache library <code>resolver.jar</code> must be present on the
662
      classpath. With Saxon on .NET, this module (cross-compiled to IL) is included within the Saxon
663
      DLL.</p>
664

    
665
    <p>Setting the <code>-catalog</code> option is equivalent to setting the following options:</p>
666

    
667
    <table>
668
      <tr>
669
        <td>
670
          <p>
671
            <code>-r</code>
672
          </p>
673
        </td>
674
        <td>
675
          <p>
676
            <code>org.apache.xml.resolver.tools.CatalogResolver</code>
677
          </p>
678
        </td>
679
      </tr>
680
      <tr>
681
        <td>
682
          <p>
683
            <code>-x</code>
684
          </p>
685
        </td>
686
        <td>
687
          <p>
688
            <code>org.apache.xml.resolver.tools.ResolvingXMLReader</code>
689
          </p>
690
        </td>
691
      </tr>
692
      <tr>
693
        <td>
694
          <p>
695
            <code>-y</code>
696
          </p>
697
        </td>
698
        <td>
699
          <p>
700
            <code>org.apache.xml.resolver.tools.ResolvingXMLReader</code>
701
          </p>
702
        </td>
703
      </tr>
704
    </table>
705

    
706
    <p>In addition, the system property <code>xml.catalog.files</code> is set to the value of the
707
      supplied <code><i>files</i></code> value. And if the <code>-t</code> option is also set, Saxon
708
      sets the verbosity level of the catalog manager to 2, causing it to report messages for each
709
      resolved URI. Saxon customizes the Apache resolver library to integrate these messages with
710
      the other output from the <code>-t</code> option: that is, by default it is sent to the
711
      standard error output.</p>
712

    
713
    <p>
714
      <i>This mechanism means that it is not possible to use any of the options <code>-r</code>,
715
          <code>-x</code>, or <code>-y</code> when the <code>-catalog</code> option is used.</i>
716
    </p>
717

    
718
    <p>When the <code>-catalog</code> option is used on the command line, this overrides the
719
      internal resolver used in Saxon (from 9.4) to redirect well-known W3C references (such as the
720
      XHTML DTD) to Saxon's local copies of these resources. Because both these features rely on
721
      setting the XML parser's <code>EntityResolver</code>, it is not possible to use them in
722
      conjunction.</p>
723

    
724
    <p>This support for OASIS catalogs is implemented only in the Saxon command line. To use
725
      catalogs from a Saxon application, it is necessary to configure the various options
726
      individually. For example:</p>
727

    
728
    <ul>
729
      <li>
730
        <p>To use catalogs to resolve references to DTDs and external entities, choose
731
            <code>ResolvingXMLReader</code> as your XML parser, or set
732
            <code>org.apache.xml.resolver.tools.CatalogResolver</code> as the
733
            <code>EntityResolver</code> used by your chosen XML parser.</p>
734
      </li>
735

    
736
      <li>
737
        <p>To use catalogs to resolve <code>xsl:include</code> and <code>xsl:import</code>
738
          references, choose <code>org.apache.xml.resolver.tools.CatalogResolver</code> as the
739
            <code>URIResolver</code> used by Saxon when compiling the stylesheet.</p>
740
      </li>
741

    
742
      <li>
743
        <p>To use catalogs to resolve calls on <code>doc()</code> or <code>document()</code>
744
          references, choose <code>org.apache.xml.resolver.tools.CatalogResolver</code> as the
745
            <code>URIResolver</code> used by Saxon when running the stylesheet (for example, using
746
            <code>Transformer.setURIResolver()</code>).</p>
747
      </li>
748
    </ul>
749

    
750
    <p>Here is an example of a very simple catalog file. The <code>publicId</code> and
751
        <code>systemId</code> attributes give the public or system identifier as used in the source
752
      document; the <code>uri</code> attribute gives the location (in this case a relative location)
753
      where the actual resource will be found.</p>
754

    
755

    
756

    
757
    <samp><![CDATA[<?xml version="1.0"?>
758
<catalog  xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">  
759
   <group  prefer="public"  xml:base="file:///usr/share/xml/" >  
760

    
761
      <public 
762
         publicId="-//OASIS//DTD DocBook XML V4.5//EN"  
763
         uri="docbook45/docbookx.dtd"/>
764

    
765
      <system
766
         systemId="http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd"  
767
         uri="docbook45/docbookx.dtd"/>
768

    
769
   </group>
770
</catalog>]]></samp>
771

    
772
    <p>There are many tutorials for XML catalogs available on the web, including some that have
773
      information specific to Saxon, though this may well relate to earlier releases.</p>
774
  </section>
775
  <section id="input-filters" title="Writing input filters">
776
    <h1>Writing input filters</h1>
777

    
778

    
779
    <p>Saxon can take its input from a JAXP <code>SAXSource</code> object, which essentially
780
      represents a sequence of SAX events representing the output of an XML parser. A very useful
781
      technique is to interpose a <i>filter</i> between the parser and Saxon. The filter will
782
      typically be an instance of the SAX2 <strong>XMLFilter</strong> class. </p>
783

    
784
    <p>There are a number of ways of using a Saxon XSLT transformation as part of a pipeline of
785
      filters. Some of these techniques also work with XQuery. The techniques include:</p>
786
    <ul>
787
      <li>
788
        <p>Generate the transformation as an <code>XMLFilter</code> using the
789
            <code>newXMLFilter()</code> method of the <code>TransformerFactory</code>. This works
790
          with XSLT only. A drawback of this approach is that it is not possible to supply
791
          parameters to the transformation using standard JAXP facilities. It is possible, however,
792
          by casting the <code>XMLFilter</code> to a <a class="javalink" href="net.sf.saxon.jaxp.FilterImpl"
793
            >net.sf.saxon.jaxp.FilterImpl</a>, and calling its <code>getTransformer()</code> method, which
794
          returns a <code>Transformer</code> object offering the usual <code>addParameter()</code>
795
          method.</p>
796
      </li>
797
      <li>
798
        <p>Generate the transformation as a SAX <code>ContentHandler</code> using the
799
            <code>newTransformerHandler()</code> method. The pipeline stages after the
800
          transformation can be added by giving the transformation a <code>SAXResult</code> as its
801
          destination. This again is XSLT only.</p>
802
      </li>
803
      <li>
804
        <p>Implement the pipeline step before the transformation or query as an
805
            <code>XMLFilter</code>, and use this as the <code>XMLReader</code> part of a
806
            <code>SAXSource</code>, pretending to be an XML parser. This technique works with both
807
          XSLT and XQuery, and it can even be used from the command line, by nominating the
808
            <code>XMLFilter</code> as the source parser using the <code>-x</code> option on the
809
          command line.</p>
810
      </li>
811
    </ul>
812

    
813
    <p>The <code>-x</code> option on the Saxon command line specifies the parser that Saxon will use
814
      to process the source files. This class must implement the SAX2 <code>XMLReader</code>
815
      interface, but it is not required to be a real XML parser; it can take the input from any kind
816
      of source file, so long as it presents it in the form of a stream of SAX events. When using
817
      the JAXP API, the equivalent to the <code>-x</code> option is to call
818
        <code>transformerFactory.setAttribute( net.sf.saxon.lib.Feature.SOURCE_PARSER_CLASS.name,
819
        'com.example.package.Parser')</code></p>
820
  </section>
821
  <section id="XInclude" title="XInclude processing">
822
    <h1>XInclude processing</h1>
823

    
824

    
825
    <p>If you are using Xerces as your XML parser, you can have Xerces expand any XInclude
826
      directives.</p>
827

    
828
    <p>The <code>-xi</code> option on the command line causes XInclude processing to be applied to
829
      all input XML documents. This includes source documents, stylesheets, and schema documents
830
      listed on the command line, and also those loaded indirectly for example by calls on the
831
        <code>doc()</code> function or by mechanisms such as <code>xsl:include</code> and
832
        <code>xs:include</code>.</p>
833

    
834
    <p>From the Java API, the equivalent is to call <code>setXInclude()</code> on the
835
        <code>Configuration</code> object, or to set the attribute denoted by <a
836
        class="bodylink code" href="/configuration/config-features">Feature.XINCLUDE.name</a> to
837
        <code>Boolean.TRUE</code> on the <code>TransformerFactory</code>.</p>
838

    
839
    <p>XInclude processing can be requested at a per-document level by creating an <a
840
        class="javalink" href="net.sf.saxon.lib.AugmentedSource">AugmentedSource</a> and calling its
841
        <code>setXIncludeAware()</code> method. The corresponding method is also recognized on
842
      Saxon's implementation of the JAXP <code>DocumentBuilderFactory</code>. When the
843
        <code>doc()</code> or <code>document()</code> or <code>collection()</code> function is
844
      called from an XPath expression, XInclude processing can be enabled by including
845
        <code>xinclude=yes</code> among the query parameters in the URI.</p>
846
    
847
    <p>It is possible to request XInclude processing for the documents in a collection by including
848
    the query parameter <code>xinclude=yes</code> in the collection URI. Similarly, for a document
849
    read using the <code>doc()</code> or <code>document()</code> functions, XInclude processing can
850
      be requested using <code>xinclude=yes</code> in the document URI -- but only if the
851
    <code>StandardURIResolver</code> is used, and the feature is enabled by calling
852
      <code>Configuration.setParameterizedURIResolver()</code> or by setting <code>-p:on</code>
853
    on the <code>Query</code> or <code>Transform</code> command lines.</p>
854
    
855
    <p>The <a class="bodylink code" href="/xsl-elements/source-document">xsl:source-document</a>
856
      instruction can enable XInclude processing using
857
    the extension attribute <code>saxon:xinclude="yes"</code>.</p>
858

    
859
    <p>It is also possible to switch on XInclude processing (for all documents) by setting the
860
      system property:</p>
861
    <samp><![CDATA[-Dorg.apache.xerces.xni.parser.XMLParserConfiguration=
862
    org.apache.xerces.parsers.XIncludeParserConfiguration
863
]]></samp>
864

    
865
    <p>An alternative approach is to incorporate an XInclude processor as a SAX filter in the input
866
      pipeline. You can find a suitable SAX filter at <a href="http://xincluder.sourceforge.net/"
867
        class="bodylink">http://xincluder.sourceforge.net/</a>, and you can incorporate it into your
868
      application as described in <a class="bodylink" href="../input-filters">Writing Input
869
        Filters</a>.</p>
870

    
871
    <p>On the .NET platform, there is a customized <code>XmlReader</code> that performs XInclude
872
      processing available at <a href="http://mvpxml.codeplex.com" class="bodylink"
873
        >http://mvpxml.codeplex.com</a>. You can supply this as an argument to the method
874
        <code>Build(XmlReader parser)</code> in the <a class="javalink"
875
        href="Saxon.Api.DocumentBuilder">DocumentBuilder</a> class of the .NET Saxon API.</p>
876

    
877
    <p>For further information on using XInclude, see <a
878
        href="http://www.sagehill.net/docbookxsl/Xinclude.html" class="bodylink"
879
        >http://www.sagehill.net/docbookxsl/Xinclude.html</a>.</p>
880
  </section>
881
  <section id="controlling-parsing" title="Controlling Parsing of Source Documents">
882
    <h1>Controlling Parsing of Source Documents</h1>
883

    
884

    
885
    <p>Saxon does not include its own XML parser. By default:</p>
886

    
887
    <ul>
888
      <li>
889
        <p>On the Java platform, the default SAX parser provided as part of the JDK is used. With
890
          the Sun/Oracle JDK, this is a variant of the Apache Xerces parser customized by Sun.</p>
891
      </li>
892
      <li>
893
        <p>On the .NET platform, Saxon includes a copy of the Apache Xerces parser cross-compiled to
894
          run on .NET.</p>
895
      </li>
896
    </ul>
897

    
898
    <p>An error reported by the XML parser is generally fatal. It is not possible to process
899
      ill-formed XML.</p>
900

    
901
    <p>There are several ways you can cause a different XML parser to be used:</p>
902

    
903
    <ul>
904
      <li>
905
        <p>The <code>-x</code> and <code>-y</code> options on the command line can be used to
906
          specify the class name of a SAX parser, which Saxon will load in preference to the default
907
          SAX parser. The <code>-x</code> option is used for source XML documents, the
908
            <code>-y</code> option for schemas and stylesheets. The equivalent options can be set
909
          programmatically or by using the <a class="bodylink"
910
            href="/configuration/configuration-file">configuration file</a>.</p>
911
      </li>
912
      <li>
913
        <p>By default Saxon uses the <code>SAXParserFactory</code> mechanism to load a parser. This
914
          can be configured by setting the system property
915
            <code>javax.xml.parsers.SAXParserFactory</code>, by means of the file
916
            <code>lib/jaxp.properties</code> in the JRE directory, or by adding another parser to
917
          the <code>lib/endorsed</code> directory.</p>
918
      </li>
919
      <li>
920
        <p>The source for parsing can be supplied in the form of a <code>SAXSource</code> object,
921
          which has an <code>XMLReader</code> property containing the parser instance to be
922
          used.</p>
923
      </li>
924
      <li>
925
        <p>On .NET, the configuration option <code>PREFER_JAXP_PARSER</code> can be set to false, in
926
          which case Saxon will use the Microsoft XML parser instead of the Apache parser. (This
927
          parser is not used by default because it does not notify <code>ID</code> attributes to the
928
          application, which means the XPath <code>id()</code> and <code>idref()</code> functions do
929
          not work.)</p>
930
      </li>
931
      <li>
932
        <p>For a document read using the <code>doc()</code> or <code>document()</code> functions,
933
          the parser (XMLReader) to be used can be specified using the query parameter
934
          <code>?parser=full.class.name</code> in the document URI -- but only if the
935
          <code>StandardURIResolver</code> is used, and the feature is enabled by calling
936
          <code>Configuration.setParameterizedURIResolver()</code> or by setting <code>-p:on</code>
937
          on the <code>Query</code> or <code>Transform</code> command lines. For example,
938
          <code>parser=org.ccil.cowan.tagsoup.Parser</code> causes John Cowan's TagSoup parser
939
          for HTML to be used.</p>
940
      </li>
941
    </ul>
942

    
943
    <p>Saxonica traditionally recommended use of the Xerces parser from Apache in preference to the version bundled
944
      in the JDK, which was known to have some serious bugs. However, there is some evidence that the version bundled
945
    in Java 8 is more reliable.</p>
946

    
947
    <p>By default, Saxon invokes the parser in non-validating mode (that is, without requested DTD
948
      validation). Note however, that the parser still needs to read the DTD if one is present,
949
      because it may contain entity definitions that need to be expanded. DTD validation can be
950
      requested using <code>-dtd:on</code> on the command line, or equivalent API or configuration
951
      options.</p>
952

    
953
    <p>Saxon is issued with local copies of commonly-used W3C DTDs such as the XHTML, SVG, and
954
      MathML DTDs. When Saxon itself instantiates the XML parser, it will use an
955
        <code>EntityResolver</code> that causes these local copies of DTDs to be used rather than
956
      fetching public copies from the web (the W3C servers are increasingly failing to serve these
957
      requests as the volume of traffic is too high). It is possible to override this using the
958
      configuration setting <code>ENTITY_RESOLVER_CLASS</code>, which can be set to the name of a
959
      user-supplied <code>EntityResolver</code>, or to the empty string to indicate that no
960
        <code>EntityResolver</code> should be used. Saxon will not add this
961
        <code>EntityResolver</code> in cases where the XML parser instance is supplied by the caller
962
      as part of a <code>SAXSource</code> object. It will add it to a parser obtained as an instance
963
      of the class specified using the <code>-x</code> and <code>-y</code> command line options,
964
      unless either the use of the <code>EntityResolver</code> is suppressed using the
965
        <code>ENTITY_RESOLVER_CLASS</code> configuration option, or the instantiated parser already
966
      has an <code>EntityResolver</code> registered.</p>
967

    
968
    <p>Saxon never asks the XML parser to perform schema validation. If schema validation is
969
      required it should be requested using the command line options <code>-val:strict</code> or
970
        <code>-val:lax</code>, or their API equivalents. Saxon will then use its own schema
971
      processor to validate the document as it emerges from the XML parser. Schema processing is
972
      done in parallel with parsing, by use of a SAX-like pipeline.</p>
973

    
974

    
975

    
976

    
977

    
978
  </section>
979
  <section id="xml11" title="Saxon and XML 1.1">
980
    <h1>Saxon and XML 1.1</h1>
981

    
982

    
983
    <p>XML 1.1 (with XML Namespaces 1.1) originally extended XML 1.0 in three ways:</p>
984
    <ul>
985
      <li>
986
        <p>the set of valid characters is increased</p>
987
      </li>
988
      <li>
989
        <p>the set of characters allowed in XML Names is increased</p>
990
      </li>
991
      <li>
992
        <p>namespace undeclarations are permitted</p>
993
      </li>
994
    </ul>
995

    
996
    <p>The second change has subsequently been retrofitted to XML 1.0 Fifth Edition (XML 1.0e5).
997
      Saxon now uses the XML 1.1 and XML 1.0e5 rules unconditionally for all validation of XML
998
      names.</p>
999

    
1000
    <p>Saxon is capable of working with XML 1.1 input documents. If you want to use Saxon with XML
1001
      1.1, you should set the option <code>-xmlversion:1.1</code> on the Saxon command line, or call
1002
      the method <a class="javalink" href="net.sf.saxon.Configuration#setXMLVersion"
1003
        >configuration.setXMLVersion(Configuration.XML11)</a> or, in the case of XSLT,
1004
        <code>transformerFactory.setAttribute(FeaturesKeys.XML_VERSION, "1.1")</code>.</p>
1005

    
1006
    <p>This configuration setting affects:</p>
1007
    <ul>
1008
      <li>
1009
        <p>the characters considered valid in the source of an XQuery query</p>
1010
      </li>
1011
      <li>
1012
        <p>the characters considered valid in the result of the functions
1013
            <code>codepoints-to-string()</code> and <code>unparsed-text()</code></p>
1014
      </li>
1015
      <li>
1016
        <p>the characters considered valid in the result of certain Saxon extension functions</p>
1017
      </li>
1018
      <li>
1019
        <p>the way in which line endings in XQuery queries are normalized</p>
1020
      </li>
1021
      <li>
1022
        <p>the default version used by the serializer (with output method XML)</p>
1023
      </li>
1024
    </ul>
1025

    
1026
    <p>Since Saxon 9.4, the configuration setting no longer affects:</p>
1027
    <ul>
1028
      <li>
1029
        <p>validation of names used in XQuery and XPath expressions, including names of elements,
1030
          attributes, functions, variables, and types</p>
1031
      </li>
1032
      <li>
1033
        <p>validation of names of constructed elements, attributes, and processing instructions in
1034
          XQuery and XSLT</p>
1035
      </li>
1036
      <li>
1037
        <p>schema validation of values of type <code>xs:NCName</code>, <code>xs:QName</code>,
1038
            <code>xs:NOTATION</code>, and <code>xs:ID</code></p>
1039
      </li>
1040
      <li>
1041
        <p>the permitted names of stylesheet objects such as keys, templates, decimal-formats,
1042
          output declarations, and output methods</p>
1043
      </li>
1044
    </ul>
1045

    
1046

    
1047
    <p>Note that if you use the default setting of "1.0", then supplying an XML 1.1 source document
1048
      as input may cause undefined errors.</p>
1049

    
1050
    <p>It is advisable to use an XML parser that supports XML 1.1 when the configuration is set to
1051
      "1.1", and an XML parser that does not support XML 1.1 when the configuration is set to "1.0".
1052
      However, Saxon does not enforce this.</p>
1053

    
1054
    <p>You can set the configuration to allow XML 1.1, but still serialize result documents as XML
1055
      1.0 by specifying the output property <code>version="1.0"</code>. In this case Saxon will
1056
      check while serializing the document that it conforms to the XML 1.0 constraints (note that
1057
      this check can be expensive). These checks are not performed if the configuration default is
1058
      set to XML 1.0.</p>
1059

    
1060
    <p>If you want the serializer to output namespace undeclarations, use the output property
1061
        <code>undeclare-namespaces="yes"</code> as well as <code>version="1.1"</code>.</p>
1062
  </section>
1063
  <section id="jaxpsources" title="JAXP Source Types">
1064
    <h1>JAXP Source Types</h1>
1065

    
1066

    
1067
    <p>
1068
      <i>This section is relevant to the Java platform only.</i>
1069
    </p>
1070

    
1071
    <p>When a user application invokes Saxon via the Java API, then a source document is supplied as
1072
      an instance of the JAXP <code>Source</code> class. This is true whether invoking an XSLT
1073
      transformation, an XQuery query, or a free-standing XPath expression. The <code>Source</code>
1074
      class is essentially a marker interface. The <code>Source</code> that is supplied must be a
1075
      kind of <code>Source</code> that Saxon recognizes.</p>
1076

    
1077
    <p>Saxon recognizes all three kinds of <code>Source</code> defined in JAXP: a
1078
        <code>StreamSource</code>, a <code>SAXSource</code>, and a <code>DOMSource</code>. </p>
1079
    
1080
    <ul>
1081
      <li>
1082
        <p>When using a <code>StreamSource</code>, note:</p>
1083
        <ul>
1084
          <li>A <code>StreamSource</code> that wraps an <code>InputStream</code> or <code>Reader</code>
1085
            can only be used once: it is consumed by use. However, a <code>StreamSource</code> that wraps
1086
          a <code>File</code> or URI can be used multiple times.</li>
1087
          <li>Whoever creates an <code>InputStream</code> or <code>Reader</code> is responsible for closing
1088
          it after use. This means that if Saxon creates an <code>InputStream</code> from a supplied <code>File</code>
1089
            or URI, it will close that <code>InputStream</code> after use; but if the <code>InputStream</code> is created
1090
          by the calling application, then the calling application is responsible for closing it. (On some operating systems
1091
          it is important not to leave unclosed streams lying around.)</li>
1092
          <li>If the <code>StreamSource</code> wraps an <code>InputStream</code> or <code>Reader</code>, then the base URI
1093
          of the document is taken from the <code>SystemID</code> property of the <code>StreamSource</code>. If this is not set,
1094
          then the base URI is unknown, which may cause constructs that require a known base URI to fail.</li>
1095
        </ul>
1096
        <aside>There are cases where it is difficult for the application to take responsibility for closing a stream after it has been read to completion.
1097
        For example, if a <code>URIResolver</code> returns a <code>StreamSource</code>, there is no callback from Saxon
1098
        to the application at the time the stream has been exhausted. Saxon therefore allows the <code>StreamSource</code>
1099
        to be wrapped in an <code>AugmentedSource</code>, whose <code>setPleaseCloseAfterUse()</code> method can be used
1100
        to request that Saxon closes the stream.</aside>
1101
      
1102
      </li>
1103
      <li>
1104
        <p>When using a <code>SAXSource</code>, note:</p>
1105
        <ul>
1106
          <li>If no <code>XMLReader</code> is supplied, Saxon will allocate one, based on settings in the <code>Configuration</code>.</li>
1107
          <li>Processing of the contained <code>InputSource</code> is entirely the responsibility of the XML parser; Saxon is not involved
1108
          in this.</li>
1109
          <li>Saxon will modify properties of the supplied <code>XMLReader</code>: it will set the <code>ContentHandler</code>
1110
          and <code>LexicalHandler</code> so that it can receive the output of parsing, and it will set the <code>ErrorHandler</code>
1111
          so it can handle parsing errors.</li>
1112
          <li>Saxon makes no attempt to ensure that processing of a <code>SAXSource</code> or its underlying <code>XMLReader</code>
1113
          is thread-safe. The same <code>XMLReader</code> should not be used concurrently in multiple threads.</li>
1114
        </ul>
1115
        
1116
      </li>
1117
      <li>
1118
        <p>When using a <code>DOMSource</code>, note:</p>
1119
        <ul>
1120
          <li>The DOM is not thread-safe, even when used in read-only mode. Saxon therefore synchronizes all its access to DOM methods.
1121
          However, that's no protection if there are application threads accessing the DOM that aren't using Saxon.</li>
1122
          <li>The base URI
1123
            of the document is taken from the <code>SystemID</code> property of the <code>DOMSource</code>. If this is not set,
1124
            then the base URI is unknown, which may cause constructs that require a known base URI to fail.</li>
1125
          <li>From Saxon 9.8, Saxon-EE uses a new mechanism for processing DOM trees, called the Domino model. This involves creating
1126
          an index of all the nodes in the DOM, providing for faster navigation. Saxon-PE and Saxon-HE continue to use the DOM <code>NodeWrapper</code>
1127
          model, where DOM methods are used to navigate the tree. A transformation using the Domino model takes typically twice as long as Saxon's native <code>TinyTree</code>,
1128
          while the <code>NodeWrapper</code> model can take 5 to 10 times as long. An alternative approach is to convert the DOM tree to a <code>TinyTree</code> before the
1129
          transformation starts. Even better: don't use DOM in the first place.</li>
1130
        </ul>
1131
      </li>
1132
    </ul>
1133
        
1134
        <p>Other kinds of <code>Source</code> that are recognized by most Saxon interfaces are:</p>
1135
        
1136
        <ul>
1137
          <li><code>TreeInfo</code>: Saxon's <code>TreeInfo</code> holds information about a document (or more generally any tree of nodes), 
1138
            and can be used directly as a <code>Source</code> of a transformation.</li>
1139
          <li><code>NodeInfo</code>: Saxon's <code>NodeInfo</code> represents a node in a tree, 
1140
            and can be used directly as a <code>Source</code> of a transformation.</li>
1141
          <li><code>StaxSource</code>: allows a pull parser to be used.</li>
1142
          <li><code>PullSource</code>: Saxon's internal pull interface.</li>
1143
          <li><code>EventSource</code>: Similar to an <code>XMLReader</code>,but with a much simpler interface, an <code>EventSource</code>
1144
          has a <code>send()</code> method that sends a stream of events to a Saxon <code>Receiver</code>.</li>
1145
          <li><code>SaplingDocument</code>: a sapling tree constructed using the sapling construction interface can be used anywhere
1146
          (within Saxon) that a <code>Source</code> is expected.</li>
1147
        </ul>
1148
      
1149
    
1150

    
1151
    <p>Saxon also accepts input from an <code>XMLStreamReader</code>
1152
        (<code>javax.xml.stream.XMLStreamReader</code>), that is a StAX pull parser as defined in
1153
      JSR 173. This is achieved by creating an instance of <a class="javalink"
1154
        href="net.sf.saxon.pull.StaxBridge">net.sf.saxon.pull.StaxBridge</a>, supplying the
1155
        <code>XMLStreamReader</code> using the <code>setXMLStreamReader()</code> method, and
1156
      wrapping the <code>StaxBridge</code> object in an instance of <a class="javalink"
1157
        href="net.sf.saxon.pull.PullSource">net.sf.saxon.pull.PullSource</a>, which implements the
1158
      JAXP <code>Source</code> interface and can be used in any Saxon method that expects a
1159
        <code>Source</code>. Saxon has been validated with two StAX parsers: the Zephyr parser from
1160
      Sun (which is supplied as standard with JDK 1.6), and the open-source Woodstox parser from
1161
      Tatu Saloranta. In Saxonica's experience, Woodstox is the more reliable of the two. However, there is
1162
      no immediate benefit in using a pull parser to supply Saxon input rather than a push parser;
1163
      the main use case for using an <code>XMLStreamReader</code> is when the data is supplied from
1164
      some source other than parsing of lexical XML.</p>
1165

    
1166
    <p>Nodes in Saxon's implementation of the XPath data model are represented by the interface <a
1167
        class="javalink" href="net.sf.saxon.om.NodeInfo">NodeInfo</a>. A <code>NodeInfo</code> is
1168
      itself a <code>Source</code>, which means that any method in the API that requires a source
1169
      object will accept any implementation of <code>NodeInfo</code>. As discussed in the next
1170
      section, implementations of <code>NodeInfo</code> are available to wrap Axiom, DOM, DOM4J,
1171
      JDOM2, or XOM nodes, and in all cases these wrapper objects can be used wherever a
1172
        <code>Source</code> is required.</p>
1173

    
1174
    <p>Saxon also provides a class <a class="javalink" href="net.sf.saxon.lib.AugmentedSource"
1175
        >net.sf.saxon.lib.AugmentedSource</a> which implements the <code>Source</code> interface.
1176
      This class encapsulates one of the standard <code>Source</code> objects, and allows additional
1177
      processing options to be specified. These options include whitespace handling, schema and DTD
1178
      validation, XInclude processing, error handling, choice of XML parser, and choice of Saxon
1179
      tree model.</p>
1180

    
1181
    <p>Saxon allows additional <code>Source</code> types to be supported by registering a <a
1182
        class="javalink" href="net.sf.saxon.lib.SourceResolver">SourceResolver</a> with the <a
1183
        class="javalink" href="net.sf.saxon.Configuration">Configuration</a> object. The task of a
1184
        <code>SourceResolver</code> is to convert a <code>Source</code> that Saxon does not
1185
      recognize into a <code>Source</code> that it does recognize. For example, this may be done by
1186
      building the document tree in memory and returning the <a class="javalink"
1187
        href="net.sf.saxon.om.NodeInfo">NodeInfo</a> object representing the root of the tree.</p>
1188
  </section>
1189
  <section id="thirdparty"
1190
    title="Third-party Object Models: Axiom, DOM, JDOM2, XOM, and DOM4J">
1191
    <h1>Third-party Object Models: Axiom, DOM, JDOM2, XOM, and DOM4J</h1>
1192

    
1193

    
1194
    <p>
1195
      <i>This section is relevant to the Java platform only.</i>
1196
    </p>
1197

    
1198
    <p>In the case of DOM, all Saxon editions support DOM access "out of the box", and no special
1199
      configuration action is necessary. See also <a class="bodylink" href="/sourcedocs/domino">The Domino Tree Model</a>.</p>
1200

    
1201
    <p>Support for Axiom, JDOM2, XOM, and DOM4J is not available "out of the box" with
1202
      Saxon-HE, but the source code is open source (in sub-packages of
1203
        <code>net.sf.saxon.option</code>) and can be compiled for use with Saxon-HE if required.</p>
1204

    
1205
    <aside>In general, use of a third party tree implementation is much less efficient than using
1206
      Saxon's native <code>TinyTree</code>. These models should only be used if your application
1207
      needs to construct them for other reasons. Transforming a DOM can take up to 10 times longer
1208
      than transforming the equivalent <code>TinyTree</code>.</aside>
1209

    
1210

    
1211
    <p>The support code for Axiom, DOM4J, JDOM2, and XOM is integrated into the main JAR files
1212
      for Saxon-PE and Saxon-EE, but (unlike the case of DOM) it is not activated unless the object
1213
      model is registered with the <a class="javalink" href="net.sf.saxon.Configuration"
1214
        >Configuration</a>. To activate support for one of these models, the implementation must either be included 
1215
      in the relevant section of the
1216
      configuration file, or it must be nominated to the configuration using the method <a class="javalink"
1217
        href="net.sf.saxon.Configuration#registerExternalObjectModel"
1218
        >registerExternalObjectModel()</a>. </p>
1219
    
1220
    <aside>Support for JDOM version 1 is dropped with effect from Saxon 10.0. Applications should migrate
1221
    to JDOM2.</aside>
1222

    
1223
    <p>Each supported object model is represented in Saxon by a <a class="javalink"
1224
        href="net.sf.saxon.om.TreeModel">TreeModel</a> object, which in the case of external object
1225
      models will also be an instance of <a class="javalink"
1226
        href="net.sf.saxon.lib.ExternalObjectModel">ExternalObjectModel</a>. The
1227
        <code>TreeModel</code> can be used to get a <code>Builder</code>, which can then be used to
1228
      construct an instance of the model from SAX input. The <code>Builder</code> can also be
1229
      inserted into a pipeline to capture the output of a transformation or query.</p>
1230

    
1231
    <p>For DOM input, the source can be supplied by wrapping a <code>DOMSource</code> around the DOM
1232
      Document node. For Axiom, JDOM2, XOM, and DOM4J the approach is similar, except that the
1233
      wrapper classes are supplied by Saxon itself: they are <a class="javalink"
1234
        href="net.sf.saxon.option.axiom.AxiomDocument"
1235
        >net.sf.saxon.option.axiom.AxiomDocument</a>,  <a class="javalink"
1236
        href="net.sf.saxon.option.jdom2.JDOM2DocumentWrapper"
1237
        >net.sf.saxon.option.jdom2.JDOM2DocumentWrapper</a>, <a class="javalink"
1238
        href="net.sf.saxon.option.xom.XOMDocumentWrapper"
1239
        >net.sf.saxon.option.xom.XOMDocumentWrapper</a>, and <a class="javalink"
1240
        href="net.sf.saxon.option.dom4j.DOM4JDocumentWrapper"
1241
        >net.sf.saxon.option.dom4j.DOM4JDocumentWrapper</a> respectively. These wrapper classes
1242
      implement the Saxon <a class="javalink" href="net.sf.saxon.om.NodeInfo">NodeInfo</a> interface
1243
      (which means that they also implement <code>Source</code>).</p>
1244

    
1245

    
1246
    <aside>Note that the Xerces DOM implementation is not thread-safe, even for read-only access.
1247
      Saxon's wrapper classes for the DOM therefore synchronize all access to the DOM. This provides
1248
      thread-safety, but only if the application takes care to avoid creating more than one wrapper
1249
      for the same DOM Document.</aside>
1250

    
1251
    <p>Saxon supports these models by wrapping each external node in a wrapper that implements the
1252
      Saxon <code>NodeInfo</code> interface. When nodes are returned by the XQuery or XPath API,
1253
      these wrappers are removed and the original node is returned. Similarly, the wrappers are
1254
      generally removed when extension functions expecting a node are called.</p>
1255

    
1256
    <p>Saxon does not support wrapping of an external tree that contains entity reference nodes.
1257
      Most parsers provide an option to avoid constructing a tree that contains such nodes. For
1258
      example, with the JDK Xerces DOM parser, use <code>DOMParser dp = new DOMParser();
1259
        dp.setFeature("http://apache.org/xml/features/dom/create-entity-ref-nodes",
1260
        expandEntities);</code>. If there is a need to process a tree that does contain entity
1261
      references, it should be copied to a Saxon tree. (Note, this only affects entities explicitly
1262
      declared in a DTD. It does not affect character references or built-in entity references such
1263
      as <code>&amp;lt;</code>, which never appear as entity reference nodes in the tree.)</p>
1264

    
1265
    <p>In the case of DOM only, Saxon also supports a wrapping the other way around: an object
1266
      implementing the DOM interface may be wrapped around a Saxon <code>NodeInfo</code>. This is
1267
      done when Java methods expecting a DOM <code>Node</code> are called as extension functions, if
1268
      the <code>NodeInfo</code> is not itself a wrapper for a DOM <code>Node</code>.</p>
1269

    
1270
    <p>You can also send output to a DOM by using a <code>DOMResult</code>, or to a JDOM2 tree by
1271
      using a <code>JDOM2Result</code>, or to a XOM document by using a <code>XOMWriter</code>. In
1272
      such cases it is a good idea to set <code>saxon:require-well-formed="yes"</code> on
1273
        <code>xsl:output</code> to ensure that the transformation or query result is a well-formed
1274
      document (for example, that it does not contain several elements at the top level).</p>
1275

    
1276
    <p>External object models do not in all cases fully support the XDM (XPath data model). In
1277
      particular, many of them have restrictions concerning the recognition of <code>ID</code> and
1278
        <code>IDREF</code> attributes. In most cases they do not allow "namespace undeclarations" (so
1279
      a prefix that is in-scope for a parent element will always be in-scope for its child elements).
1280
      None of the external object models support typed
1281
      (schema-validated) data, and none support in-situ update using XQuery updates.</p>
1282
  </section>
1283
  <section id="choosingmodel" title="Choosing a Tree Model">
1284
    <h1>Choosing a Tree Model</h1>
1285

    
1286

    
1287
    <p>Saxon provides several implementations of the internal tree data structure (or tree model).
1288
      The tree model can be chosen by an option on the command line (<code>-tree:tiny</code> for the
1289
      tiny tree, <code>-tree:linked</code> for the linked tree). There is also a variant of the tiny
1290
      tree called a "condensed tiny tree" which saves space (at the expense of build time) by
1291
      recognizing text nodes and attribute nodes whose values appear more than once in the input
1292
      document. The tree model can also be selected from the Java API. The default is to use the
1293
      tiny tree model. The choice should make no difference to the results of a transformation
1294
      (except the order of attributes and namespace declarations) but only affects performance.</p>
1295

    
1296
    <p>
1297
      <i>The "linked tree" is the only model to support in-situ updates, so if you are using XQuery
1298
        Update you must choose this model.</i>
1299
    </p>
1300

    
1301
    <p>Generally speaking, the tiny tree model is both faster to build and faster to navigate. It
1302
      also uses less space.</p>
1303

    
1304
    <p>The tiny tree model gives most benefit when you are processing a large document. It uses a
1305
      lot less memory, so it can prevent thrashing when the size of document is such that the linked
1306
      tree doesn't fit in real memory. Use the "condensed" variant if you need to save memory, and
1307
      if your source data contains many text or attribute nodes with repeated values.</p>
1308
    
1309
    <p>Saxon also offers the option <code>-tree:condensed</code>. This delivers a TinyTree with
1310
    additional compression. Specifically, when a document contains multiple text nodes or
1311
    attribute nodes with the same string value, the condensed tree will "common up" the storage
1312
    for these nodes. This option gives a further reduction in memory usage, at the cost of slower
1313
    tree construction.</p>
1314

    
1315
    <p>The linked tree is used internally to represent stylesheet and schema modules because of the
1316
      programming convenience it offers: it allows element nodes on the tree to be represented by
1317
      custom classes for each kind of element. The linked tree is also needed when you want to use
1318
      XQuery Update, because unlike the tiny tree, it is mutable.</p>
1319

    
1320
    <p>
1321
      <i>If in doubt, stick with the default.</i>
1322
    </p>
1323
  </section>
1324
  <section id="domino" title="The Domino Tree Model">
1325
    <h1>The Domino Tree Model</h1>
1326
    <p>The Domino tree model was introduced in Saxon 9.8 and is available in Saxon-EE only. It is a new approach
1327
    to the handling of DOM source trees.</p>
1328
    <p>The Domino data structure is essentially a combination of the DOM and parts of the TinyTree. It takes the
1329
    unchanged DOM tree, and indexes it with vectors containing information (for each DOM node) about the node kind,
1330
    node name, and level in the document. These vectors are exactly the same as those used in the TinyTree; the difference
1331
    is that there is no text content, or attributes; these are replaced by references to the DOM nodes. 
1332
    All navigation around the tree is done purely using the index vectors,
1333
    while retrieval of the string value of text and attribute nodes is done by reference to the DOM structure. The effect
1334
    is that navigation is almost as fast as using the TinyTree, but queries are still able to return the original DOM Nodes.</p>
1335
    <p>Overall, queries and transformations using the Domino model take about double the time of the same query using the
1336
    TinyTree, compared with 5 to 10 times longer using the DOM Wrapper model. There is an initial overhead in building
1337
    the indexes, but this is incurred once only.</p>
1338
    <p>The Domino model must not be used with a DOM tree that is subject to update, other than changes to the values of
1339
    attribute or text nodes, which might work (but are still best avoided). Saxon has no way of preventing or detecting
1340
    updates, so these will generally cause catastrophic failure.</p>
1341
    
1342
  </section>
1343
  <section id="ptree" title="The PTree File Format">
1344
    <h1>The PTree File Format</h1>
1345

    
1346
    <p>The PTree (persistent tree) was a binary XML serialization supported by earlier Saxon
1347
    releases. It has been dropped from the product with effect from Saxon 10.0. Third-party
1348
    offerings such as EXI do the same job better.</p>
1349
 
1350
  </section>
1351
  <section id="validation" title="Validation of Source Documents">
1352
    <h1>Validation of Source Documents</h1>
1353

    
1354

    
1355
    <p>With Saxon-EE, source documents may be validated against a schema. Not only does this perform
1356
      a check that the document is valid, it also adds type information to each element and
1357
      attribute node in the document to identify the schema type against which it was validated. It
1358
      may also expand the source document by adding default values of elements and attributes.</p>
1359

    
1360
    <p>If the option <code>-val:strict</code> is specified on the command line for
1361
        <code>com.saxonica.Query</code> or <code>com.saxonica.Transform</code>, then the principal
1362
      source document to the query or transformation is schema-validated, as is every document
1363
      loaded using the <code>doc()</code> or <code>document()</code> function. Saxon will look among
1364
      all the loaded schemas for an element declaration that matches the outermost element of the
1365
      document, and will then check that the document is valid against that element declaration,
1366
      reporting a fatal error if it is not. The loaded schemas include schemas imported statically
1367
      into the query or stylesheet using <code>import schema</code> or
1368
        <code>xsl:import-schema</code>, schemas referenced in the <code>xsi:schemaLocation</code> or
1369
        <code>xsi:noNamespaceSchemaLocation</code> attributes of the source document itself, and
1370
      schemas loaded by the application using the <code>addSchema</code> method of the <a
1371
        class="javalink" href="net.sf.saxon.Configuration">Configuration</a> object.</p>
1372

    
1373
    <p>As an alternative to <code>-val:strict</code>, the option <code>-val:lax</code> may be
1374
      specified. This validates the document if and only if an element declaration can be found. If
1375
      there is no declaration of the outermost element in any loaded schema, then it is left as an
1376
      untyped document.</p>
1377

    
1378
    <p>When invoking transformations or queries from the Java API, the equivalent of the
1379
        <code>-val:strict</code> option is to call the method
1380
        <code>setSchemaValidation(Validation.STRICT)</code> on the <code>Configuration</code>
1381
      object. The equivalent of <code>-val:lax</code> is
1382
        <code>setSchemaValidation(Validation.LAX)</code>.</p>
1383

    
1384
    <p>When documents are built using the <a class="javalink"
1385
        href="net.sf.saxon.s9api.DocumentBuilder">DocumentBuilder</a> in the s9api interface, or the
1386
        <a class="javalink" href="Saxon.Api.DocumentBuilder">DocumentBuilder</a> in the Saxon.Api
1387
      interface on .NET, validation may be controlled by setting the appropriate options on the
1388
        <code>DocumentBuilder</code>.</p>
1389

    
1390
    <p>On Java interfaces that expect a JAXP <code>Source</code> object it is possible to request
1391
      validation by supplying an <a class="javalink" href="net.sf.saxon.lib.AugmentedSource"
1392
        >AugmentedSource</a>. This consists of a <code>Source</code> and a set of options, including
1393
      validation options; since <code>AugmentedSource</code> implements the JAXP <code>Source</code>
1394
      interface it is possible to use it anywhere that a <code>Source</code> is expected, including
1395
      as the object returned by a user-written <code>URIResolver</code>.</p>
1396

    
1397
    <p>Saxon's standard <code>URIResolver</code> uses this technique if it has been enabled (for
1398
      example by using <code>-p</code> on the command line). With this option, any URI containing
1399
      the query parameter <code>?validation=strict</code> (for example,
1400
      <code>doc('source.xml?validation=strict')</code>) causes strict validation to be requested for that
1401
      document, while <code>?validation=lax</code> requests lax validation, and <code>?validation=strip</code>
1402
      requests no validation.</p>
1403
    
1404
    <p>XSLT 3.0 provides a standard way of requesting validation for individual source documents,
1405
      using the <code>validation</code> and <code>type</code> attributes of the <a class="bodylink
1406
        code" href="/xsl-elements/source-document">xsl:source-document</a> instruction.</p>
1407
    
1408
  </section>
1409
  <section id="whitespace" title="Whitespace Stripping in Source Documents">
1410
    <h1>Whitespace Stripping in Source Documents</h1>
1411

    
1412

    
1413
    <p>A number of factors combine to determine whether whitespace-only text nodes in the source
1414
      document are visible to the user-written XSLT or XQuery code.</p>
1415

    
1416
    <p>By default, if there is a DTD or schema, then <i>ignorable whitespace</i> is stripped from
1417
      any source document loaded from a <code>StreamSource</code> or <code>SAXSource</code>.
1418
      Ignorable whitespace is defined as the whitespace that appears separating the child elements
1419
      in elements declared to have element-only content. This whitespace is removed regardless of
1420
      any <code>xml:space</code> attributes in the source document.</p>
1421

    
1422
    <p>It is possible to change this default behavior in several ways.</p>
1423
    <ul>
1424
      <li>
1425
        <p>From the <code>com.saxonica.Query</code> or <code>com.saxonica.Transform</code> command
1426
          line, options are available: <code>-strip:all</code> strips all whitespace text nodes,
1427
            <code>-strip:none</code> strips no whitespace text nodes, and
1428
            <code>-strip:ignorable</code> strips ignorable whitespace text nodes only (this is the
1429
          default).</p>
1430
      </li>
1431
      <li>
1432
        <p>If the <code>-p</code> option is used on the command line, then query parameters are
1433
          recognized in the URI passed to the <code>document()</code> or <code>doc()</code>
1434
          function. The parameter <code>strip-space=yes</code> strips all whitespace text nodes,
1435
            <code>strip-space=no</code> strips no whitespace text nodes, and
1436
            <code>strip-space=ignorable</code> strips ignorable whitespace text nodes only. This
1437
          overrides anything specified on the command line.</p>
1438
      </li>
1439
      <li>
1440
        <p>Options corresponding to the above can also be set on the <code>TransformerFactory</code>
1441
          object or on the <a class="javalink" href="net.sf.saxon.Configuration">Configuration</a>.
1442
          These settings are global.</p>
1443
      </li>
1444
    </ul>
1445

    
1446
    <p>Whitespace stripping that is specified in any of the above ways does not occur only if the
1447
      source document is parsed under Saxon's control: that is, if it is supplied as a JAXP
1448
        <code>StreamSource</code> or <code>SAXSource</code>. It also applies where the input is
1449
      supplied in the form of a tree (for example, a DOM). In this case Saxon wraps the supplied
1450
      tree in a virtual tree that provides a view of the original tree with whitespace text nodes
1451
      omitted.</p>
1452

    
1453
    <p>This whitespace stripping is additional (and prior) to any stripping carried out as a result
1454
      of the <code>xsl:strip-space</code> declaration in the stylesheet.</p>
1455
    
1456
    <p>Saxon never modifies a supplied tree <i>in situ</i>: if a tree is supplied as input, and the stylesheet
1457
      requests space stripping, then a virtual tree is created and whitespace is stripped on the fly as
1458
      it is navigated. This is expensive (it can add 25% to processing time); it is therefore best to
1459
      supply a <code>SAXSource</code> or <code>StreamSource</code> as input to a transformation, so
1460
      that Saxon can strip unwanted whitespace while the tree is being parsed and built.
1461
    </p>
1462
  </section>
1463
  <section id="streaming" title="Streaming of Large Documents">
1464
    <h1>Streaming of Large Documents</h1>
1465

    
1466
    <aside>Streaming is available only in Saxon-EE.</aside>
1467

    
1468
    <p>Sometimes source documents are too large to hold in memory. Saxon-EE provides a range of
1469
      facilities for processing such documents in <i>streaming mode</i>: that is, processing data as
1470
      it is read by the XML parser, without building a complete tree representation of the document
1471
      in memory.</p>
1472

    
1473
    <p>These facilities are closely aligned with the XSLT 3.0 Recommendation. Some facilities
1474
      are specific to Saxon, and a few facilities are also available in XQuery.</p>
1475

    
1476
    <p>Inevitably there are things that cannot be done in streaming mode - sorting is an obvious
1477
      example. Sometimes, achieving a streaming transformation means rethinking the design of how it
1478
      works - for example, splitting it into multiple phases. So streaming is rarely a case of
1479
      simply taking your existing code and setting a simple switch to request streamed
1480
      implementation.</p>
1481

    
1482
    <p>For more information, see the following sections:</p>
1483

    
1484
    <nav>
1485
      <ul/>
1486
    </nav>
1487

    
1488
    <section id="xslt-streaming" title="Streaming using XSLT 3.0">
1489
      <h1>Streaming using XSLT 3.0</h1>
1490

    
1491
      <aside>Requires Saxon-EE.</aside>
1492

    
1493
      <p>Saxon-EE (from Saxon 9.8) is fully conformant to the final XSLT 3.0 recommendation in terms of the
1494
        streaming facilities it supports. A few gaps in coverage that were found after release were fixed for Saxon 9.9. 
1495
        There are also some extensions.</p>
1496

    
1497
      <p>There are two main ways to initiate a streaming transformation:</p>
1498

    
1499
      <ol>
1500
        <li><p>Using the <a class="bodylink code" href="/xsl-elements/source-document">xsl:source-document</a>
1501
          instruction, with the attribute <code>streamable="yes"</code>. 
1502
          Here the source document is identified within the stylesheet itself.
1503
          Typically such a stylesheet will have a named template as its entry point, and will not
1504
          have any principal source document supplied externally.</p></li>
1505
        <li><p>By supplying a source document as input to a stylesheet whose initial mode is declared
1506
          with <code>streamable="yes"</code> in an <a href="/xsl-elements/mode"
1507
            class="bodylink code">xsl:mode</a> declaration. In this case the source document must be
1508
          supplied as a <code>StreamSource</code> or <code>SAXSource</code>, and not as an in-memory
1509
          tree. The details depend on which API is being used:</p>
1510
          <ul>
1511
            <li><p>With the Java s9api API, compile the stylesheet to create an <code>XsltExecutable</code>,
1512
            and then use the <code>load30</code> method to create an <code>Xslt30Transformer</code>.
1513
            Invoke the streamed transformation using the <code>applyTemplates</code> method of
1514
            the <code>Xslt30Transformer</code>, supplying the input as a <code>StreamSource</code>
1515
            or <code>SAXSource</code>.</p></li> 
1516
            <li><p>Similarly with the Saxon.Api interface on .NET, use the method
1517
            <code>Xslt30Transformer.ApplyTemplates()</code>, supplying a <code>Stream</code> 
1518
            as input.</p></li>
1519
            <li><p>With the JAXP API, start by instantiating a <code>com.saxonica.config.StreamingTransformerFactory</code>.
1520
            Invoke the transformation in the usual way by creating a <code>Transformer</code> (optionally via a
1521
            <code>Templates</code> object). When the <code>transform()</code> method is called with a
1522
            <code>StreamSource</code> or <code>SAXSource</code> as input, and when the initial mode
1523
            is a streamable mode, the input will be streamed. In consequence, this approach breaks the
1524
            normal JAXP convention whereby the document supplied as the <code>Source</code> argument to
1525
            the <code>transform()</code> method also becomes the global context item (the value of "." when
1526
            accessed within the initializer of a global variable). Instead such a reference fails with 
1527
            an XPDY0002 dynamic error.</p>
1528
            <p>The <code>StreamingTransformerFactory</code> can also be used to create an <code>XMLFilter</code>
1529
            which takes streamed input and produces streamed output, and a pipeline can be built from a
1530
            sequence of such filters connected end-to-end in the usual JAXP way.</p></li>
1531
          </ul>
1532
        </li>
1533
      </ol>
1534

    
1535
      <p>The <a class="bodylink code" href="/functions/saxon/stream">saxon:stream</a> extension
1536
        function used in previous releases is still supported for the time being. In Saxon 9.8 and later a
1537
        call on <code>saxon:stream</code> is translated at compile time into a call on the XSLT 3.0
1538
          <code>&lt;xsl:source-document&gt;</code> instruction. The original Saxon mechanism for streaming,
1539
        namely the <code>saxon:read-once</code> attribute on <code>xsl:copy-of</code>, was dropped
1540
        in Saxon 9.6.</p>
1541

    
1542
      <p>The rules for whether a construct is streamable or not are largely the same in Saxon as in
1543
        the XSLT 3.0 specification. Saxon applies these rules after doing any optimization
1544
        re-writes, so some constructs end up being streamable in Saxon even though they are not
1545
        guaranteed streamable in the W3C spec, because the Saxon optimizer rewrites the expression
1546
        into a streamable form. An example of this effect is where variables or functions are
1547
        inlined before doing the streamability analysis. In contrast, when streaming is requested,
1548
        the optimizer takes care to avoid rewriting streamable constructs into a non-streamable
1549
        form.</p>
1550

    
1551
      <p>This documentation does not attempt to provide a tutorial introduction to the streaming
1552
        capabilities of XSLT 3.0. The specification itself is not easy to read, especially the
1553
        detailed rules on which constructs are deemed streamable. However, for the most part it is
1554
        not necessary to be familiar with the detailed rules. The main things to remember are:</p>
1555

    
1556
      <ul>
1557
        <li>A construct is "consuming" if it reads a subtree of the source document, that is, if it
1558
          makes a downwards selection from the context item. In general, constructs are not allowed
1559
          to have two operands that are both consuming. Some exceptions to this are: the <a
1560
            class="bodylink code" href="/xsl-elements/fork">xsl:fork</a> instruction; conditional
1561
          expressions such as <a class="bodylink code" href="/xsl-elements/choose">xsl:choose</a> if
1562
          each branch only contains one consuming expression; the map expression
1563
            <code>map{...}</code> in XPath and the <a class="bodylink code" href="/xsl-elements/map"
1564
            >xsl:map</a> instruction in XSLT.</li>
1565
        <li>During a streaming pass, the XSLT processor remembers the ancestors of the context item
1566
          and all the attributes of ancestors. Path expressions that access the ancestors and their
1567
          attributes are therefore allowed. However, such expressions should generally return atomic
1568
          values (for example the values of attributes) rather than returning nodes in the streamed
1569
          document, because if nodes are returned, the system often can't be sure that there is no
1570
          disallowed navigation from those nodes (for example, you can't get all the descendants of
1571
          an ancestor node).</li>
1572
        <li>It's not permitted to bind a streamed node to a variable or parameter, or to pass it to
1573
          a function.</li>
1574
        <li>An expression such as <code>//section</code> is referred to as a crawling expression.
1575
          Crawling expressions potentially contain nodes which overlap each other, which creates
1576
          problems if you want to make further downward selections from such nodes. The XSLT 3.0
1577
          specification allows this in some circumstances, for example you can pass such an
1578
          expression to a function that atomizes the result, but other cases (for example, using
1579
          such an expression in <a class="bodylink code" href="/xsl-elements/for-each"
1580
            >xsl:for-each</a> or <a class="bodylink code" href="/xsl-elements/apply-templates"
1581
            >xsl:apply-templates</a>) are forbidden. If you know that the expression will never
1582
          select overlapping nodes (for example, if you know that <code>//title</code> will never
1583
          select one title appearing within another title), then you can rewrite the expression as
1584
            <code>outermost(//title)</code> to avoid the restrictions. Saxon also allows overlapping
1585
          nodes in some contexts where the W3C specification does not, provided streamability
1586
          extensions are enabled.</li>
1587
        <li>When you hit these restrictions, you can often work around them by making a copy of a
1588
          subtree of the streamed document, for example by using the new <a class="bodylink code"
1589
            href="/functions/fn/copy-of">copy-of()</a> or <a class="bodylink code"
1590
            href="/functions/fn/snapshot">snapshot()</a> functions. These are consuming expressions,
1591
          but the result is "grounded" (that is, an ordinary in-memory tree) so it can be used
1592
          without any restrictions. Clearly this only works if the subtrees that you copy are small
1593
          enough to fit in memory.</li>
1594
      </ul>
1595

    
1596
      <p>The XSLT 3.0 constructs most relevant to streaming are:</p>
1597

    
1598
      <ul>
1599
        <li><strong>Streamable template rules</strong>. XSLT 3.0 has a new <a class="bodylink code"
1600
            href="/xsl-elements/mode">xsl:mode</a> declaration, and this allows all the template
1601
          rules in a particular mode to be declared streamable (<code>&lt;xsl:mode
1602
            streamable="yes"/&gt;</code>). If a mode is declared streamable, then Saxon checks
1603
          whether all the template rules in that mode are actually streamable, and reports a
1604
          compile-time error if not.</li>
1605
        <li>The <a class="bodylink code"
1606
          href="/xsl-elements/source-document">xsl:source-document</a> instruction.
1607
          This has an <code>href</code> attribute which defines the URI of a streamed input
1608
          document, and the instructions within <code>xsl:source-document</code> are evaluated with this
1609
          document as the context node. When streamed processing is requested using the attribute
1610
          <code>streamable="yes"</code>, the body of the <code>xsl:source-document</code> instruction must
1611
          satisfy the streamability rules; again, any violation is detected at compile time.</li>
1612
        <li>The <a class="bodylink code" href="/xsl-elements/iterate">xsl:iterate</a> instruction.
1613
          This is like an <a class="bodylink code" href="/xsl-elements/for-each">xsl:for-each</a>
1614
          instruction except that it guarantees to process the selected nodes in order, and the
1615
          results of processing one node can be passed as a parameter to the next iteration, so the
1616
          action applied to one node can influence the way in which subsequent nodes are processed.
1617
          This often provides a solution to the problem that when streaming, you can never "look
1618
          backwards" at preceding nodes. Instead of looking backwards, the information that will be
1619
          needed when processing subsequent nodes can be retained in parameters and "passed
1620
          forwards". Note that streamed nodes themselves cannot be contained in parameters, but data
1621
          derived from those nodes (or copies made using the <code>copy-of()</code> function) can.</li>
1622
        <li>The <a class="bodylink code" href="/xsl-elements/merge">xsl:merge</a> instruction allows
1623
          several input sequences to be merged, based on the value of a sort key. Any or all of the
1624
          input sequences can be streamed documents, provided that they are already correctly sorted
1625
          on the sort key value.</li>
1626
        <li><strong>Accumulators</strong> allow values to be computed "in the background" while a
1627
          streamed document is being read; the final value of the <a class="bodylink code"
1628
            href="/xsl-elements/accumulator">accumulator</a> is available by calling the <a
1629
            class="bodylink code" href="/functions/fn/accumulator-after">accumulator-after()</a>
1630
          function at the end of processing, and intermediate values are also available.
1631
          Accumulators are useful if you want to compute several values during a single processing
1632
          pass of a streamed document (for example, a minimum and maximum of some value). When the
1633
          information to be maintained in the accumulator is complex, it can be useful to hold it in
1634
          a map, which is a new data structure introduced in XSLT 3.0.</li>
1635
        <li>Saxon (from 9.9) supports an additional capability: <em>capturing accumulators</em>.
1636
         By adding the attribute <code>saxon:capture="yes"</code> to an accumulator rule with
1637
          <code>phase="end"</code>, you can tell Saxon to make a snapshot copy of the matched
1638
          element (as if by calling the <code>fn:snapshot</code> function) and the code for computing
1639
          the next value of the accumulator then has full access to this snapshot, which means it is
1640
          no longer constrained to be motionless. You can even keep the snapshot copy directly
1641
          as the value of the accumulator (just write <code>select="."</code>), or you can retain
1642
          all the matched elements (write <code>select="($value, .)"</code>). One way of writing a
1643
          streamed transformation is now to capture all the data you need in accumulators, and
1644
          to process it only when you hit the end of the document.
1645
        </li>
1646
        <li>The <a class="bodylink code" href="/xsl-elements/fork">xsl:fork</a> instruction
1647
          effectively computes several instructions in parallel. In the Saxon implementation, they
1648
          are not actually evaluated in different threads, but they are all executed during a single
1649
          scan of the streamed input document. The outputs produced by each "prong" of the
1650
            <code>xsl:fork</code> instruction are buffered in memory until all prongs have
1651
          completed, and are then assembled in the correct order to form the final result.</li>
1652
        <li><strong>Streamed grouping</strong> is possible using the <a class="bodylink code"
1653
            href="/xsl-elements/for-each-group">xsl:for-each-group</a> instruction, provided that
1654
          one of the options <code>group-adjacent</code>, <code>group-starting-with</code>, or
1655
            <code>group-ending-with</code> is used. There are restrictions on the use of the <a
1656
            class="bodylink code" href="/functions/fn/current-group">current-group()</a> function
1657
          within such an instruction: essentially, it can only be used once, because it is a
1658
          consuming construct.</li>
1659
      </ul>
1660

    
1661

    
1662
      <p>All these facilities are available in Saxon-EE only.</p>
1663

    
1664
    </section>
1665

    
1666
    <section id="streamed-query" title="Streaming in XQuery">
1667
      <h1>Streaming in XQuery</h1>
1668

    
1669
      <aside>Requires Saxon-EE.</aside>
1670

    
1671
      <p>The XQuery specification says nothing on the subject of streamed evaluation; it is left
1672
        entirely to implementations. Saxon-EE supports streaming of XQuery for simple queries, using
1673
        rules similar to those that apply to XSLT.</p>
1674

    
1675
      <p>Simple queries can be streamed by specifying <code>-stream:on</code> on the Saxon-EE
1676
        command line. There is no need to specify anything in the query itself; however, the <a
1677
          class="bodylink code" href="/functions/fn/copy-of">copy-of()</a> and <a
1678
          class="bodylink code" href="/functions/fn/snapshot">snapshot()</a> functions (defined in
1679
        the XSLT 3.0 specification) may be used if streaming is not otherwise possible.</p>
1680

    
1681
      <p>When running a query using the s9api interface, streaming must be requested both when
1682
        compiling the query (<a class="javalink" href="net.sf.saxon.s9api.XQueryCompiler"
1683
          >XQueryCompiler.setStreaming(true)</a>), and when executing it (<a class="javalink"
1684
          href="net.sf.saxon.s9api.XQueryEvaluator">XQueryEvaluator.runStreamed(Source,
1685
          Destination)</a>).</p>
1686

    
1687
      <p>The query should access the streamed input document via the context item, not via the <a
1688
          class="bodylink code" href="/functions/fn/doc">doc()</a> or <a class="bodylink code"
1689
          href="/functions/fn/collection">collection()</a> function, nor using external variables.
1690
        The source document should be supplied in the form of a <code>SAXSource</code> or
1691
          <code>StreamSource</code> object.</p>
1692

    
1693
      <p>If the query is not streamable, this will be reported as a compile-time error.</p>
1694

    
1695
      <p>The conditions for streamability are essentially the same as the rules for the body of the
1696
        <a class="bodylink code" href="/xsl-elements/source-document">xsl:source-document</a>
1697
        instruction when streamed processing is requested using the attribute
1698
        <code>streamable="yes"</code>, as in the XSLT 3.0 specification. For example:</p>
1699

    
1700
      <ol>
1701
        <li>
1702
          <p>Path expressions must use downward selection only.</p>
1703
        </li>
1704
        <li>
1705
          <p>Predicates must be motionless, which means they can reference attributes but not child
1706
            elements of the node being filtered.</p>
1707
        </li>
1708
        <li>
1709
          <p>No construct may make two downward selections. For example, the expression <code>price
1710
              - discount</code> fails because both operands use the child axis to select downwards.
1711
            If necessary, use <a class="bodylink code" href="/functions/fn/copy-of">copy-of()</a> to
1712
            copy a subtree, after which arbitrary selections within the copied subtree become
1713
            possible.</p>
1714
        </li>
1715
        <li>
1716
          <p>A streamed node may not be bound to a variable. This rules out many uses of FLWOR
1717
            expressions.</p>
1718
        </li>
1719
        <li>
1720
          <p>A streamed node must not be passed as an argument to a function call, other than
1721
            built-in function calls.</p>
1722
        </li>
1723
        <li>
1724
          <p>Global variables in the query must not reference the context item.</p>
1725
        </li>
1726
      </ol>
1727

    
1728
      <p>As with XSLT, these restrictions can often be overcome by using the <a
1729
          class="bodylink code" href="/functions/fn/copy-of">copy-of()</a> or <a
1730
          class="bodylink code" href="/functions/fn/snapshot">snapshot()</a> functions, which Saxon
1731
        makes available in XQuery as well as XSLT.</p>
1732

    
1733
    </section>
1734

    
1735
    <section id="configuration-streaming" title="Configuration options for streaming">
1736
      <h1>Configuration options for streaming</h1>
1737

    
1738
      <aside>Requires Saxon-EE.</aside>
1739

    
1740
      <p>Saxon attempts streamed evaluation only if it is explicitly requested. Streaming may be
1741
        requested in a number of ways:</p>
1742

    
1743
      <ul>
1744
        <li>
1745
          <p>By use of XSLT 3.0 language constructs that request streaming, for example the <a
1746
            class="bodylink code" href="/xsl-elements/source-document">xsl:source-document</a>
1747
            instruction with attribute <code>streamable="yes"</code>, or by
1748
            specifying <code>streamable="yes"</code> on <a class="bodylink code"
1749
              href="/xsl-elements/mode"> xsl:mode</a> or <a class="bodylink code"
1750
              href="/xsl-elements/accumulator">xsl:accumulator</a>.</p>
1751
        </li>
1752
        <li>
1753
          <p>By use of a Saxon extension that requests streaming, for example <a
1754
              class="bodylink code" href="/functions/saxon/stream">saxon:stream</a>.</p>
1755
        </li>
1756
        <li>
1757
          <p>By setting the option <code>-stream:on</code> in the XQuery command line, or the
1758
            equivalent API option (for example, in s9api, <a class="javalink"
1759
              href="net.sf.saxon.s9api.XQueryCompiler">XQueryCompiler.setStreaming(true)</a>).</p>
1760
        </li>
1761
      </ul>
1762

    
1763
      <p>There are three configuration options that control how these requests for streaming
1764
        are interpreted:</p>
1765
      
1766
      <ul>
1767
        <li>The configuration option <a class="bodylink code" href="/configuration/config-features"
1768
          >Feature.STREAMABILITY</a> may be set to one of the values "off" or "standard".
1769
          (Releases prior to 9.8 supported a third option, "extended".) With a licensed Saxon-EE
1770
        configuration, the default is "standard", which means that streaming will happen if it
1771
        is requested and if it is feasible. Setting the value to "off" causes Saxon to behave
1772
        as if there is no Saxon-EE license: that is, requests for streaming are effectively
1773
        ignored, and the stylesheet is executed in a non-streaming manner (which means that processing
1774
        of a large document may fail if there is insufficient memory).</li>
1775
        
1776
        <li>The configuration option <a class="bodylink code"
1777
          href="/configuration/config-features">Feature.STREAMING_FALLBACK</a> determines what
1778
          Saxon does when streaming is requested, and a construct is found that is deemed
1779
          non-streamable. This is a boolean option. If it is set to <code>true</code>, Saxon attempts a
1780
          non-streaming implementation of the relevant construct. If sufficient memory is available
1781
          for a non-streaming evaluation, this should always give the same result as a streamed
1782
          evaluation. When the option is set to <code>false</code> (the default), the presence of a 
1783
          construct that is deemed non-streamable causes a static (compile-time) error.</li>
1784
        
1785
        <li>The configuration option <a class="bodylink code"
1786
          href="/configuration/config-features">Feature.STRICT_STREAMABILITY</a>
1787
         determines how closely Saxon's streamability analysis follows the rules in the
1788
        W3C specification. This is a boolean value (with the default <code>false</code>): the value <code>true</code> requests
1789
        strict adherence to the W3C rules. In reality this option does not affect the rules
1790
        that Saxon applies, rather it affects when they are applied. By default Saxon first performs
1791
        all its usual compile-time optimizations to the expression tree, and then checks the final result
1792
        for streamability. During the optimization process Saxon takes care to avoid replacing streamable
1793
        constructs with non-streamable equivalents, but it may do the reverse. As a result, constructs
1794
        that are not streamable according to the W3C rules may become streamable after optimization.
1795
        (An example is the non-streamable expression <code>AUTHOR or EDITOR</code>, which Saxon rewrites
1796
          in the streamable form <code>exists(AUTHOR | EDITOR)</code>.)
1797
        For interoperability, the W3C specification requires processors to provide a mode of operation in
1798
        which the W3C streamability rules are enforced rigidly, and this is achieved by setting
1799
        <code>STRICT_STREAMABILITY</code> to <code>true</code>. With this setting, Saxon checks the
1800
        expression tree for streamability <em>before</em> doing any optimizations that change
1801
        the tree.</li>
1802
      </ul>
1803
        
1804
 
1805

    
1806
      <p>When running from the command line these options can be set for example as
1807
          <code>--streamability:off</code> or <code>--streamingFallback:on</code>.</p>
1808
    </section>
1809

    
1810
 
1811

    
1812
    <section id="burst-mode-streaming" title="Burst-mode streaming">
1813
      <h1>Burst-mode streaming</h1>
1814

    
1815
      <aside>Requires Saxon-EE.</aside>
1816

    
1817

    
1818
      <p>Burst-mode streaming takes a streamed document as input, and generates a sequence of small
1819
        subtrees containing the parts of the document that need to be processed. This can be
1820
        achieved using XSLT 3.0 syntax like this:</p>
1821

    
1822
      <samp><![CDATA[<xsl:source-document streamable="yes" href="employees.xml">
1823
  <xsl:apply-templates select="*/employee/copy-of(.)"/>  
1824
</xsl:source-document>
1825
]]></samp>
1826

    
1827
      <p>The code that processes an individual <code>employee</code> element does not need to be
1828
        streamable; it can use any XSLT constructs. The only constraint is that it cannot navigate
1829
        outside the <code>employee</code> element: because the <code>employee</code> element is a
1830
        copy of a subtree from the orginal document, it has no parent or siblings.</p>
1831

    
1832
      <p>Burst-mode streaming can also be applied to the principal input of the transformation. This
1833
        works if the transformation is run from the command line, and also if it is executed from a
1834
        Java or .NET API provided that the document is supplied as a streamed source object, not as
1835
        a pre-built tree (under Java, this means a <code>StreamSource</code> or
1836
          <code>SAXSource</code>). For example:</p>
1837

    
1838
      <samp><![CDATA[<xsl:mode streamable="yes"/>
1839
<xsl:template match="/">
1840
  <xsl:apply-templates select="*/employee/copy-of(.)"/>  
1841
</xsl:template>
1842
]]></samp>
1843

    
1844
      <p>The same effect can be achieved in XQuery if the document is supplied as the initial
1845
        context item, again in the form of a streamed input source. Although the functions
1846
          <code>copy-of()</code> and <code>snapshot()</code> are defined in the XSLT 3.0
1847
        specification, Saxon also makes them available in XQuery, allowing for example:</p>
1848

    
1849
      <samp><![CDATA[*/employee ! copy-of(.)/(name, address)
1850
]]></samp>
1851

    
1852
      <p>In XQuery there is no need for the query itself to indicate that streamed execution is
1853
        required; rather this can be requested from the command line using the option
1854
          <code>-stream:on</code>. </p>
1855

    
1856
      <p>The same effect can be achieved on external streamed documents using the <a
1857
          class="bodylink code" href="/functions/saxon/stream">saxon:stream</a> extension
1858
        function.</p>
1859

    
1860

    
1861

    
1862
      <h2 class="subtitle">Example: selective copying</h2>
1863

    
1864
      <p>A very simple way of using burst mode streaming is when making a selective copy of parts of
1865
        a document. For example, the following code creates an output document containing all the
1866
          <code>footnote</code> elements from the source document that have the attribute
1867
          <code>@type='endnote'</code>:</p>
1868

    
1869
      <p>
1870
        <strong>XSLT example (named document)</strong>
1871
      </p>
1872
      <samp><![CDATA[<xsl:template name="main">
1873
  <footnotes>
1874
    <xsl:source-document streamable="yes" href="thesis.xml">
1875
      <xsl:copy-of select=".//footnote[@type='endnote'])"/>
1876
    </xsl:source-document>  
1877
  </footnotes>
1878
</xsl:template>
1879
]]></samp>
1880

    
1881
      <p>
1882
        <strong>XQuery example (named document)</strong>
1883
      </p>
1884
      <samp><![CDATA[  <footnotes>{
1885
     saxon:stream(doc('thesis.xml')//footnote[@type='endnote']) 
1886
  }</footnotes>
1887
]]></samp>
1888

    
1889
      <p>
1890
        <strong>XSLT example (principal input document)</strong>
1891
      </p>
1892
      <samp><![CDATA[<xsl:mode streamable="yes"/>
1893
<xsl:template match="/">
1894
  <footnotes>
1895
    <xsl:copy-of select=".//footnote[@type='endnote'])"/>
1896
  </footnotes>
1897
</xsl:template>
1898
]]></samp>
1899

    
1900
      <p>
1901
        <strong>XQuery example (principal input document)</strong>
1902
      </p>
1903
      <samp><![CDATA[  <footnotes>{.//footnote[@type='endnote']}</footnotes>
1904
]]></samp>
1905

    
1906

    
1907
      <p>These examples work because the predicate (the expression in square brackets) is
1908
          <i>motionless</i> - evaluating the predicate does not require the source document to be
1909
        repositioned. If the predicate needs access to child elements rather than attributes, it's
1910
        necessary to make a copy of each footnote and then test the copy. The last example then
1911
        becomes:</p>
1912

    
1913
      <samp><![CDATA[  <footnotes>{.//footnote/copy-of(.)[type='endnote']}</footnotes>
1914
]]></samp>
1915
    </section>
1916

    
1917

    
1918

    
1919
    <section id="partial-reading" title="Reading source documents partially">
1920
      <h1>Reading source documents partially</h1>
1921

    
1922
      <aside>Requires Saxon-EE.</aside>
1923

    
1924

    
1925
      <p>As well as allowing a source document to be processed in a single sequential pass, the
1926
        streaming facility in many cases allows the source document to be read only partially. For
1927
        example, the following query will return true as soon as it finds a transaction with a
1928
        negative value, and will then immediately stop processing the input file:</p>
1929
      <samp><![CDATA[some $t in saxon:stream(doc('big-transaction-file.xml')//transaction)
1930
satisfies number($t/@value) lt 0
1931
]]></samp>
1932

    
1933
      <p>This facility is particularly useful for extracting data that appears near the start of a
1934
        large file. It does mean, however, that well-formedness or validity errors appearing later
1935
        in the file will not necessarily be detected.</p>
1936

    
1937
      <p>To exit early from reading a streamed document using pure XSLT 3.0 constructs, use <a
1938
          href="/xsl-elements/iterate" class="bodylink code">xsl:iterate</a> like this:</p>
1939

    
1940
      <samp><![CDATA[<xsl:variable name="contains-debit" as="xs:boolean">
1941
  <xsl:source-document streamable="yes" href="big-transaction-file.xml">
1942
    <xsl:iterate select=".//transaction">
1943
      <xsl:if test="@value lt 0">
1944
        <xsl:break select="true()"/>
1945
      </xsl:if>
1946
      <xsl:on-completion select="false()"/>
1947
    </xsl:iterate>
1948
  </xsl:source-document>
1949
</xsl:variable>
1950
]]></samp>
1951

    
1952
    </section>
1953

    
1954

    
1955

    
1956
    <section id="stream-with-iterate" title="Streaming with xsl:iterate">
1957
      <h1>Streaming with xsl:iterate</h1>
1958

    
1959
      <aside>Requires Saxon-EE.</aside>
1960

    
1961
      <p>In the examples given above, streaming is used to select a sequence of element nodes from
1962
        the source document, and each of these nodes is then processed independently. In cases where
1963
        the processing of one node depends in some way on previous nodes, it is possible to use <a
1964
          class="bodylink" href="../burst-mode-streaming">burst-mode streaming</a> in conjunction
1965
        with the new <a href="/xsl-elements/iterate" class="bodylink code">xsl:iterate</a>
1966
        instruction in XSLT 3.0.</p>
1967

    
1968
      <p>The following example takes a sequence of <code>&lt;transaction&gt;</code> elements in an
1969
        input document, each one containing the value of a debit or credit from an account. As
1970
        output it copies the transaction elements, adding a current balance.</p>
1971
      <samp><![CDATA[    <xsl:source-document streamable="yes" href="transactions.xml">          
1972
      <xsl:iterate select="account/transaction">
1973
        <xsl:param name="balance" as="xs:decimal" select="0.00"/>
1974
        <xsl:variable name="new-balance" as="xs:decimal" select="$balance + xs:decimal(@value)"/>
1975
        <transaction balance="{$new-balance}">
1976
           <xsl:copy-of select="@*"/>
1977
        </transaction>
1978
        <xsl:next-iteration>
1979
          <xsl:with-param name="balance" select="$new-balance"/>
1980
        </xsl:next-iteration>
1981
      </xsl:iterate>
1982
    </xsl:source-document>  
1983
]]></samp>
1984

    
1985
      <p>The following example is similar: this time it copies the account number (contained in a
1986
        separate element at the start of the file) into each transaction element:</p>
1987
      <samp><![CDATA[    <xsl:source-document streamable="yes" href="transactions.xml">           
1988
      <xsl:iterate select="account/(account-number|transaction)">
1989
        <xsl:param name="accountNr"/>
1990
        <xsl:choose>
1991
           <xsl:when test="self::account-number">
1992
             <xsl:next-iteration>
1993
                <xsl:with-param name="accountNr" select="string(.)"/>
1994
             </xsl:next-iteration>
1995
           </xsl:when>
1996
           <xsl:otherwise>
1997
             <transaction account-number="{$accountNr}">
1998
               <xsl:copy-of select="@*"/>
1999
             </transaction>
2000
           </xsl:otherwise>
2001
        </xsl:choose>
2002
      </xsl:iterate>
2003
    </xsl:source-document>  
2004
]]></samp>
2005

    
2006
      <p>Here is a more complex example, one that groups adjacent transaction elements having the
2007
        same date attribute. The two loop parameters are the current grouping key and the current
2008
        date. The contents of a group are accumulated in a variable until the date changes.</p>
2009
      <samp><![CDATA[    <xsl:source-document streamable="yes" href="transactions.xml">           
2010
      <xsl:iterate select="account/transaction">
2011
        <xsl:param name="group" as="element(transaction)*" select="()"/>
2012
        <xsl:param name="currentDate" as="xs:date?" select="()"/>
2013
        <xsl:choose>
2014
          <xsl:when test="xs:date(@date) eq $currentDate or empty($group)">
2015
            <xsl:next-iteration>
2016
              <xsl:with-param name="currentDate" select="@date"/>
2017
              <xsl:with-param name="group" select="($group, .)"/>
2018
            </xsl:next-iteration>
2019
          </xsl:when>
2020
          <xsl:otherwise>
2021
            <daily-transactions date="{$currentDate}">
2022
              <xsl:copy-of select="$group"/>
2023
            </daily-transactions>
2024
            <xsl:next-iteration>
2025
              <xsl:with-param name="group" select="."/>
2026
              <xsl:with-param name="currentDate" select="@date"/>
2027
            </xsl:next-iteration>            
2028
          </xsl:otherwise>
2029
        </xsl:choose>
2030
        <xsl:on-completion>
2031
          <final-daily-transactions date="{$currentDate}">
2032
            <xsl:copy-of select="$group"/>
2033
          </final-daily-transactions>
2034
        </xsl:on-completion>        
2035
      </xsl:iterate>
2036
    </xsl:source-document>  
2037
]]></samp>
2038

    
2039
      <p>Note that when an <a class="bodylink code" href="/xsl-elements/iterate">xsl:iterate</a>
2040
        loop is terminated using <a class="bodylink code" href="/xsl-elements/break">xsl:break</a>,
2041
        parsing of the source document will be abandoned. This provides a convenient way to read
2042
        data near the start of a large file without incurring the cost of reading the entire
2043
        file.</p>
2044
    </section>
2045

    
2046
    <section id="stream-with-merge" title="Streaming with xsl:merge">
2047
      <h1>Streaming with xsl:merge</h1>
2048

    
2049
      <aside>Requires Saxon-EE.</aside>
2050

    
2051
      <p>Saxon (since 9.6) allows several streamed inputs to be merged using the new XSLT 3.0 <a
2052
          href="/xsl-elements/merge" class="bodylink code">xsl:merge</a> instruction. For this to
2053
        work, there are a number of rules to follow:</p>
2054

    
2055
      <ol>
2056
        <li>
2057
          <p>Streaming must be requested by specifying <code>streamable="yes"</code> on the <a
2058
              class="bodylink code" href="/xsl-elements/merge-source">xsl:merge-source</a>
2059
            element.</p>
2060
        </li>
2061
        <li>
2062
          <p>When streaming is requested, the <code>for-each-source</code> attribute of
2063
              <code>xsl:merge-source</code> must be present, and must be a single string.</p>
2064
        </li>
2065
        <li>
2066
          <p>The <code>select</code> attribute on the <code>xsl:merge-source</code> element must
2067
            take the form of a motionless pattern.</p>
2068
        </li>
2069
      </ol>
2070

    
2071
      <p>For each node selected by the <code>select</code> expression, Saxon takes an implicit
2072
        snapshot (in the sense of the XSLT 3.0 <a class="bodylink code"
2073
          href="/functions/fn/snapshot">fn:snapshot()</a> function). The merge keys are evaluated in
2074
        relation to this snapshot, and it is this snapshot that is presented within the
2075
          <code>xsl:merge-action</code> construct as the result of the <a class="bodylink code"
2076
          href="/functions/fn/current-merge-group">fn:current-merge-group()</a> function.</p>
2077

    
2078
      <p>Here is an example of streamed merging of two log files:</p>
2079

    
2080
      <samp><![CDATA[<xsl:merge>
2081
  <xsl:merge-source streamable="yes"
2082
       for-each-source="'log-file-1.xml'" select="events/event">
2083
    <xsl:merge-key select="xs:dateTime(@timestamp)"/>
2084
  </xsl:merge-source>
2085
  <xsl:merge-source streamable="yes"
2086
       for-each-source="'log-file-2.xml'" select="log/day/record">
2087
    <xsl:merge-key select="dateTime(../@date, time)"/>
2088
  </xsl:merge-source>
2089
  <xsl:merge-action>
2090
    <group>
2091
      <xsl:copy-of select="current-merge-group()" />
2092
    </group>
2093
  </xsl:merge-action>
2094
</xsl:merge>]]></samp>
2095
    </section>
2096

    
2097

    
2098
    <section id="streaming-templates" title="Streaming Templates">
2099
      <h1>Streaming Templates</h1>
2100

    
2101
      <aside>Requires Saxon-EE.</aside>
2102

    
2103
      <p>Streaming templates allow a document to be processed hierarchically in the classical XSLT
2104
        style, applying template rules to each element (or other nodes) in a top-down manner, while
2105
        scanning the source document in a pure streaming fashion, without building the source tree
2106
        in memory. Saxon-EE allows streamed processing of a document using template rules, provided
2107
        the templates conform to a set of strict guidelines.</p>
2108

    
2109
      <p>Streaming in this way is a property of a <strong>mode</strong>; a mode can be declared to
2110
        be streamable, and if it is so declared, then all template rules using that mode must obey
2111
        the rules for streamability. A mode is declared to be streamable using the top-level
2112
        stylesheet declaration:</p>
2113

    
2114
      <samp><![CDATA[<xsl:mode name="s" streamable="yes"/>]]></samp>
2115

    
2116
      <p>The <code>name</code> attribute is optional; if omitted, the declaration applies to the
2117
        default (unnamed) mode.</p>
2118

    
2119
      <p>Streamed processing of a source document can be applied either to the principal source
2120
        document of the transformation, or to a secondary source document read using the <a
2121
          class="bodylink code" href="/xsl-elements/source-document">xsl:source-document</a>
2122
        instruction.</p>
2123

    
2124
      <p>To use streaming on the principal source document, the input to the transformation must be
2125
        supplied in the form of a <code>StreamSource</code> or <code>SAXSource</code>, and the
2126
        initial mode selected on entry to the transformation must be a streamable mode. In this case
2127
        there must be no references to the context item in the initializer of any global
2128
        variable.</p>
2129

    
2130
      <p>Streamed processing of a secondary document is initiated using the instruction:</p>
2131

    
2132
      <samp><![CDATA[<xsl:source-document streamable="yes" href="abc.xml">
2133
  <xsl:apply-templates mode="s"/>
2134
</xsl:source-document>]]></samp>
2135

    
2136
      <p>Saxon will also recognize an instruction of the form:</p>
2137

    
2138
      <samp><![CDATA[<xsl:apply-templates select="doc('abc.xml')" mode="s"/>]]></samp>
2139

    
2140
      <p>Here the <code>select</code> attribute must contain a simple call on the <a
2141
          class="bodylink code" href="/functions/fn/doc">doc()</a> or <a class="bodylink code"
2142
          href="/functions/fn/document">document()</a> function, and the mode (explicit or implicit)
2143
        must be declared as streamable. The call on <code>doc()</code> or <code>document()</code>
2144
        can be extended with a streamable selection path, for example
2145
          <code>select="doc('employee.xml')/*/employee"</code>.</p>
2146

    
2147
      <p>If a mode is declared as streamable, then it must ONLY be used in streaming mode; it is not
2148
        possible to apply templates using a streaming mode if the selected nodes are ordinary
2149
        non-streamed nodes. </p>
2150

    
2151
      <p>Every template rule within a streamable mode must follow strict rules to ensure it can be
2152
        processed in a streaming manner. The essence of these rules is:</p>
2153
      <ol>
2154
        <li>
2155
          <p>The match pattern for the template rule must be a simple pattern that can be evaluated
2156
            when positioned at the start tag of an element, without repositioning the stream (but
2157
            information about the ancestors of the element and their attributes is available,
2158
            together with some limited information about their position relative to their siblings).
2159
            Examples of acceptable patterns are <code>*</code>, <code>para</code>,
2160
              <code>para[1]</code>, or <code>para/*</code>.</p>
2161
          <p>If the match pattern includes a boolean predicate, then the predicate must be
2162
            "motionless", which means that it can be evaluated while the input stream is positioned
2163
            at the start tag. This means it can reference properties such as <code>name()</code> and
2164
              <code>base-uri()</code>, and can reference attributes of the element, but cannot
2165
            reference its children or content.</p>
2166
          <p>If the match pattern includes a numeric predicate, then it must be possible to evaluate
2167
            this by counting either the total number of preceding-sibling elements, or the number of
2168
            preceding siblings with a given name. Examples of permitted patterns include
2169
              <code>*[1]</code>, <code>p[3]</code>, and <code>*:p[2][@class='bold']</code>;
2170
            disallowed patterns include <code>(descendant::fig)[1]</code>,
2171
              <code>p[@class='bold'][2]</code>, and <code>p[last()]</code>.</p>
2172
        </li>
2173
        <li>
2174
          <p> The body of the template rule must contain at most one expression or instruction that
2175
            reads the contents below the matched element (that is, children or descendants), and it
2176
            must process the contents in document order. This expression or instruction will often
2177
            be one of the following:</p>
2178
          <ul>
2179
            <li>
2180
              <p>
2181
                <code>&lt;xsl:apply-templates/&gt;</code>
2182
              </p>
2183
            </li>
2184
            <li>
2185
              <p>
2186
                <code>&lt;xsl:value-of select="."/&gt;</code>
2187
              </p>
2188
            </li>
2189
            <li>
2190
              <p>
2191
                <code>&lt;xsl:copy-of select="."/&gt;</code>
2192
              </p>
2193
            </li>
2194
            <li>
2195
              <p>
2196
                <code>string(.)</code>
2197
              </p>
2198
            </li>
2199
            <li>
2200
              <p><code>data(.)</code> (explicitly or implicitly)</p>
2201
            </li>
2202
          </ul>
2203
          <p>but this list is not exhaustive. It is possible to process the contents selectively by
2204
            using a streamable path expression, for example:</p>
2205
          <ul>
2206
            <li>
2207
              <p>
2208
                <code>&lt;xsl:apply-templates select="foo"/&gt;</code>
2209
              </p>
2210
            </li>
2211
            <li>
2212
              <p>
2213
                <code>&lt;xsl:value-of select="a/b/c"/&gt;</code>
2214
              </p>
2215
            </li>
2216
            <li>
2217
              <p>
2218
                <code>&lt;xsl:copy-of select="x/y"/&gt;</code>
2219
              </p>
2220
            </li>
2221
          </ul>
2222
          <p>but this effectively means that the content not selected by this path is skipped
2223
            entirely; the transformation ignores it.</p>
2224
          <p>The template can access attributes of the context item without restriction, as well as
2225
            properties such as its <code>name()</code>, <code>local-name()</code>, and
2226
              <code>base-uri()</code>. It can also access the ancestors of the context item, the
2227
            attributes of the ancestors, and properties such as the name of an ancestor; but having
2228
            navigated to an ancestor, it cannot then navigate downwards or sideways, since the
2229
            siblings and the other descendants of the ancestor are not available while
2230
            streaming.</p>
2231
          <p>The restriction that only one downwards access is allowed makes it an error to use an
2232
            expression such as <code>price - discount</code> in a streamable template. This problem
2233
            can often be circumvented by making a copy of the context item. This can be done using
2234
            the <code>copy-of()</code> function: for example <code>&lt;xsl:value-of
2235
              select="copy-of(.)/(price - discount)"/&gt;</code>. Taking a copy of the context node
2236
            requires memory, of course, and should be avoided unless the contents of the node are
2237
            small.</p>
2238

    
2239
          <p>Certain constructs using positional filters can be evaluated in streaming mode. For
2240
            example, it is possible to use <code>&lt;xsl:apply-templates select="*[1]"/&gt;</code>.
2241
            The filter must be on a node test that uses the child axis and selects element nodes.
2242
            The forms accepted are expressions that can be expressed as <code>x[position() op
2243
              N]</code> where <code>N</code> is an expression that is independent of the focus and
2244
            is statically known to evaluate to a number, <code>x</code> is a node test using the
2245
            child axis, and <code>op</code> is one of the operators <code>eq</code>,
2246
            <code>le</code>, <code>lt</code>, <code>gt</code>, or <code>ge</code>. Alternative forms
2247
            of this construct such as <code>x[N]</code>, <code>remove(x, 1)</code>,
2248
              <code>head(x)</code>, <code>tail(x)</code>, and <code>subsequence(x, 1, N)</code> are
2249
            also accepted.</p>
2250
        </li>
2251
      </ol>
2252

    
2253
    </section>
2254
  </section>
2255
  <section id="projection" title="Document Projection">
2256
    <h1>Document Projection</h1>
2257

    
2258
    <aside>Document projection is available only in Saxon-EE.</aside>
2259

    
2260

    
2261
    <p>Document Projection is a mechanism that analyzes a query to determine what parts of a
2262
      document it can potentially access, and then while building a tree to represent the document,
2263
      leaves out those parts of the tree that cannot make any difference to the result of the
2264
      query.</p>
2265

    
2266
    <p>Document projection can be enabled as an option on the XQuery command line interface: set
2267
        <code>-projection:on</code>. It is only used if requested. The command line option affects
2268
      both the primary source document supplied on the command line, and any calls on the
2269
        <code>doc()</code> function within the body of the query that use a literal string argument
2270
      for the document URI.</p>
2271

    
2272
    <p>For feedback on the impact of document projection in terms of reducing the size of the source
2273
      document in memory, use the <code>-t</code> option on the command line, which shows for each
2274
      document loaded how many nodes from the input document were retained and how many
2275
      discarded.</p>
2276

    
2277
    <p>From the s9api API, document projection can be invoked as an option on the <a
2278
        class="javalink" href="net.sf.saxon.s9api.DocumentBuilder">DocumentBuilder</a>. The call
2279
        <code>setDocumentProjectionQuery()</code> supplies as its argument a compiled query (an
2280
        <code>XQueryExecutable</code>), and the document built by the document builder is then
2281
      projected to retain only the parts of the document that are accessed by this query, when it
2282
      operates on this document as the initial context item. For example, if the supplied query is
2283
        <code>count(//ITEM)</code>, then only the <code>ITEM</code> elements will be retained.</p>
2284

    
2285
    <p>It is also possible to request that a query should perform document projection on documents
2286
      that it reads using the <code>doc()</code> function, provided this has a string-literal
2287
      argument. This can be requested using the option <code>setAllowDocumentProjection(true)</code>
2288
      on the <code>XQueryExpression</code> object. This is not available directly in the s9api
2289
      interface, but the <code>XQueryExpression</code> is reachable from the
2290
        <code>XQueryExecutable</code> using the accessor method
2291
        <code>getUnderlyingCompiledQuery()</code>.</p>
2292
    <aside>It is best to avoid supplying a query that actually returns nodes from the document
2293
      supplied as the context item, since the analysis cannot know what the invoker of the query
2294
      will want to do with these nodes. For example, the query
2295
        <code>&lt;out&gt;{//ITEM}&lt;/out&gt;</code> works better than <code>//ITEM</code>, since it
2296
      is clear that all descendants of the <code>ITEM</code> elements must be retained, but not
2297
      their ancestors. If the supplied query selects nodes from the input document, then Saxon
2298
      assumes that the application will need access to the entire subtree rooted at these nodes, but
2299
      that it will not attempt to navigate upwards or outwards from these nodes. On the other hand,
2300
      nodes that are atomized (for example in a filter) will be retained without their descendants,
2301
      except as needed to compute the filter.</aside>
2302

    
2303
    <p>The more complex the query, the less likely it is that Saxon will be able to analyze it to
2304
      determine the subset of the document required. If precise analysis is not possible, document
2305
      projection has no effect. Currently Saxon makes no attempt to analyze accesses made within
2306
      user-defined functions. Also, of course, Saxon cannot analyze the expectations of external
2307
      (Java) functions called from the query.</p>
2308

    
2309
    <p>Document projection is supported only for XQuery, and it works only when a document
2310
      is parsed and loaded for the purpose of executing a single query. It is possible, however, to
2311
      use the mechanism to create a manual filter for source documents if the required subset of the
2312
      document is known. To achieve this, create a query that selects the required parts of the
2313
      document supplied as the context item, and compile it to a s9api
2314
      <code>XQueryExecutable</code>. The query does not have to do anything useful: the only
2315
      requirement is that the result of the query on the subset document must be the same as the
2316
      result on the original document. Then supply this <code>XQueryExecutable</code> to the s9api
2317
        <code>DocumentBuilder</code> used to build the document.</p>
2318

    
2319
    <p>Of course, when document projection is used manually like this then it is entirely a user
2320
      responsibility to ensure that the selected part of the document contains all the nodes
2321
      required.</p>
2322
  </section>
2323
  <section id="w3c-dtds" title="References to W3C DTDs">
2324
    <h1>References to W3C DTDs</h1>
2325

    
2326

    
2327

    
2328
    <p>During 2010-11, W3C took steps to reduce the burden of meeting requests for
2329
      commonly-referenced documents such as the DTD for XHTML. The W3C web server routinely
2330
      adds an artificial 30-second time delay for such requests. In response to this, Saxon now includes
2331
      copies of these documents within the issued JAR file, and recognizes requests for these
2332
      documents, satisfying the request using the local copy.</p>
2333

    
2334
    <p>This is done only in cases where Saxon itself instantiates the XML parser. In cases where the
2335
      user application instantiates an XML parser, the same effect can be achieved by setting the <a
2336
        class="javalink" href="net.sf.saxon.lib.StandardEntityResolver">StandardEntityResolver</a>
2337
      as a property of the <code>XMLReader</code> (parser).</p>
2338

    
2339
    <p>The documents recognized by the <code>StandardEntityResolver</code> are:</p>
2340

    
2341
    <table>
2342
      <thead>
2343
        <tr>
2344
          <td>
2345
            <p>Public ID</p>
2346
          </td>
2347
          <td>
2348
            <p>System ID</p>
2349
          </td>
2350
          <td>
2351
            <p>Saxon resource name</p>
2352
          </td>
2353
        </tr>
2354
      </thead>
2355
      <tbody>
2356
        <tr>
2357
          <td>
2358
            <p>-//W3C//ENTITIES Latin 1 for XHTML//EN</p>
2359
          </td>
2360
          <td>
2361
            <p>http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent</p>
2362
          </td>
2363
          <td>
2364
            <p>w3c/xhtml-lat1.ent</p>
2365
          </td>
2366
        </tr>
2367
        <tr>
2368
          <td>
2369
            <p>-//W3C//ENTITIES Symbols for XHTML//EN</p>
2370
          </td>
2371
          <td>
2372
            <p>http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent</p>
2373
          </td>
2374
          <td>
2375
            <p>w3c/xhtml-symbol.ent</p>
2376
          </td>
2377
        </tr>
2378
        <tr>
2379
          <td>
2380
            <p>-//W3C//ENTITIES Special for XHTML//EN</p>
2381
          </td>
2382
          <td>
2383
            <p>http://www.w3.org/TR/xhtml1/DTD/xhtml-special.ent</p>
2384
          </td>
2385
          <td>
2386
            <p>w3c/xhtml-special.ent</p>
2387
          </td>
2388
        </tr>
2389
        <tr>
2390
          <td>
2391
            <p>-//W3C//DTD XHTML 1.0 Transitional//EN</p>
2392
          </td>
2393
          <td>
2394
            <p>http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd</p>
2395
          </td>
2396
          <td>
2397
            <p>w3c/xhtml10/xhtml1-transitional.dtd</p>
2398
          </td>
2399
        </tr>
2400
        <tr>
2401
          <td>
2402
            <p>-//W3C//DTD XHTML 1.0 Strict//EN</p>
2403
          </td>
2404
          <td>
2405
            <p>http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd</p>
2406
          </td>
2407
          <td>
2408
            <p>w3c/xhtml10/xhtml1-strict.dtd</p>
2409
          </td>
2410
        </tr>
2411
        <tr>
2412
          <td>
2413
            <p>-//W3C//DTD XHTML 1.0 Frameset//EN</p>
2414
          </td>
2415
          <td>
2416
            <p>http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd</p>
2417
          </td>
2418
          <td>
2419
            <p>w3c/xhtml10/xhtml1-frameset.dtd</p>
2420
          </td>
2421
        </tr>
2422
        <tr>
2423
          <td>
2424
            <p>-//W3C//DTD XHTML Basic 1.0//EN</p>
2425
          </td>
2426
          <td>
2427
            <p>http://www.w3.org/TR/xhtml-basic/xhtml-basic10.dtd</p>
2428
          </td>
2429
          <td>
2430
            <p>w3c/xhtml10/xhtml-basic10.dtd</p>
2431
          </td>
2432
        </tr>
2433
        <tr>
2434
          <td>
2435
            <p>-//W3C//DTD XHTML 1.1//EN</p>
2436
          </td>
2437
          <td>
2438
            <p>http://www.w3.org/MarkUp/DTD/xhtml11.dtd</p>
2439
          </td>
2440
          <td>
2441
            <p>w3c/xhtml11/xhtml11.dtd</p>
2442
          </td>
2443
        </tr>
2444
        <tr>
2445
          <td>
2446
            <p>-//W3C//DTD XHTML Basic 1.1//EN</p>
2447
          </td>
2448
          <td>
2449
            <p>http://www.w3.org/MarkUp/DTD/xhtml-basic11.dtd</p>
2450
          </td>
2451
          <td>
2452
            <p>w3c/xhtml11/xhtml-basic11.dtd</p>
2453
          </td>
2454
        </tr>
2455
        <tr>
2456
          <td>
2457
            <p>-//W3C//ELEMENTS XHTML Access Element 1.0//EN</p>
2458
          </td>
2459
          <td>
2460
            <p>http://www.w3.org/MarkUp/DTD/xhtml-access-1.mod</p>
2461
          </td>
2462
          <td>
2463
            <p>w3c/xhtml11/xhtml-access-1.mod</p>
2464
          </td>
2465
        </tr>
2466
        <tr>
2467
          <td>
2468
            <p>-//W3C//ENTITIES XHTML Access Attribute Qnames 1.0//EN</p>
2469
          </td>
2470
          <td>
2471
            <p>http://www.w3.org/MarkUp/DTD/xhtml-access-qname-1.mod</p>
2472
          </td>
2473
          <td>
2474
            <p>w3c/xhtml11/xhtml-access-qname-1.mod</p>
2475
          </td>
2476
        </tr>
2477
        <tr>
2478
          <td>
2479
            <p>-//W3C//ELEMENTS XHTML Java Applets 1.0//EN</p>
2480
          </td>
2481
          <td>
2482
            <p>http://www.w3.org/MarkUp/DTD/xhtml-applet-1.mod</p>
2483
          </td>
2484
          <td>
2485
            <p>w3c/xhtml11/xhtml-applet-1.mod</p>
2486
          </td>
2487
        </tr>
2488
        <tr>
2489
          <td>
2490
            <p>-//W3C//ELEMENTS XHTML Base Architecture 1.0//EN</p>
2491
          </td>
2492
          <td>
2493
            <p>http://www.w3.org/MarkUp/DTD/xhtml-arch-1.mod</p>
2494
          </td>
2495
          <td>
2496
            <p>w3c/xhtml11/xhtml-arch-1.mod</p>
2497
          </td>
2498
        </tr>
2499
        <tr>
2500
          <td>
2501
            <p>-//W3C//ENTITIES XHTML Common Attributes 1.0//EN</p>
2502
          </td>
2503
          <td>
2504
            <p>http://www.w3.org/MarkUp/DTD/xhtml-attribs-1.mod</p>
2505
          </td>
2506
          <td>
2507
            <p>w3c/xhtml11/xhtml-attribs-1.mod</p>
2508
          </td>
2509
        </tr>
2510
        <tr>
2511
          <td>
2512
            <p>-//W3C//ELEMENTS XHTML Base Element 1.0//EN</p>
2513
          </td>
2514
          <td>
2515
            <p>http://www.w3.org/MarkUp/DTD/xhtml-base-1.mod</p>
2516
          </td>
2517
          <td>
2518
            <p>w3c/xhtml11/xhtml-base-1.mod</p>
2519
          </td>
2520
        </tr>
2521
        <tr>
2522
          <td>
2523
            <p>-//W3C//ELEMENTS XHTML Basic Forms 1.0//EN</p>
2524
          </td>
2525
          <td>
2526
            <p>http://www.w3.org/MarkUp/DTD/xhtml-basic-form-1.mod</p>
2527
          </td>
2528
          <td>
2529
            <p>w3c/xhtml11/xhtml-basic-form-1.mod</p>
2530
          </td>
2531
        </tr>
2532
        <tr>
2533
          <td>
2534
            <p>-//W3C//ELEMENTS XHTML Basic Tables 1.0//EN</p>
2535
          </td>
2536
          <td>
2537
            <p>http://www.w3.org/MarkUp/DTD/xhtml-basic-table-1.mod</p>
2538
          </td>
2539
          <td>
2540
            <p>w3c/xhtml11/xhtml-basic-table-1.mod</p>
2541
          </td>
2542
        </tr>
2543
        <tr>
2544
          <td>
2545
            <p>-//W3C//ENTITIES XHTML Basic 1.0 Document Model 1.0//EN</p>
2546
          </td>
2547
          <td>
2548
            <p>http://www.w3.org/MarkUp/DTD/xhtml-basic10-model-1.mod</p>
2549
          </td>
2550
          <td>
2551
            <p>w3c/xhtml11/xhtml-basic10-model-1.mod</p>
2552
          </td>
2553
        </tr>
2554
        <tr>
2555
          <td>
2556
            <p>-//W3C//ENTITIES XHTML Basic 1.1 Document Model 1.0//EN</p>
2557
          </td>
2558
          <td>
2559
            <p>http://www.w3.org/MarkUp/DTD/xhtml-basic11-model-1.mod</p>
2560
          </td>
2561
          <td>
2562
            <p>w3c/xhtml11/xhtml-basic11-model-1.mod</p>
2563
          </td>
2564
        </tr>
2565
        <tr>
2566
          <td>
2567
            <p>-//W3C//ELEMENTS XHTML BDO Element 1.0//EN</p>
2568
          </td>
2569
          <td>
2570
            <p>http://www.w3.org/MarkUp/DTD/xhtml-bdo-1.mod</p>
2571
          </td>
2572
          <td>
2573
            <p>w3c/xhtml11/xhtml-bdo-1.mod</p>
2574
          </td>
2575
        </tr>
2576
        <tr>
2577
          <td>
2578
            <p>-//W3C//ELEMENTS XHTML Block Phrasal 1.0//EN</p>
2579
          </td>
2580
          <td>
2581
            <p>http://www.w3.org/MarkUp/DTD/xhtml-blkphras-1.mod</p>
2582
          </td>
2583
          <td>
2584
            <p>w3c/xhtml11/xhtml-blkphras-1.mod</p>
2585
          </td>
2586
        </tr>
2587
        <tr>
2588
          <td>
2589
            <p>-//W3C//ELEMENTS XHTML Block Presentation 1.0//EN</p>
2590
          </td>
2591
          <td>
2592
            <p>http://www.w3.org/MarkUp/DTD/xhtml-blkpres-1.mod</p>
2593
          </td>
2594
          <td>
2595
            <p>w3c/xhtml11/xhtml-blkpres-1.mod</p>
2596
          </td>
2597
        </tr>
2598
        <tr>
2599
          <td>
2600
            <p>-//W3C//ELEMENTS XHTML Block Structural 1.0//EN</p>
2601
          </td>
2602
          <td>
2603
            <p>http://www.w3.org/MarkUp/DTD/xhtml-blkstruct-1.mod</p>
2604
          </td>
2605
          <td>
2606
            <p>w3c/xhtml11/xhtml-blkstruct-1.mod</p>
2607
          </td>
2608
        </tr>
2609
        <tr>
2610
          <td>
2611
            <p>-//W3C//ENTITIES XHTML Character Entities 1.0//EN</p>
2612
          </td>
2613
          <td>
2614
            <p>http://www.w3.org/MarkUp/DTD/xhtml-charent-1.mod</p>
2615
          </td>
2616
          <td>
2617
            <p>w3c/xhtml11/xhtml-charent-1.mod</p>
2618
          </td>
2619
        </tr>
2620
        <tr>
2621
          <td>
2622
            <p>-//W3C//ELEMENTS XHTML Client-side Image Maps 1.0//EN</p>
2623
          </td>
2624
          <td>
2625
            <p>http://www.w3.org/MarkUp/DTD/xhtml-csismap-1.mod</p>
2626
          </td>
2627
          <td>
2628
            <p>w3c/xhtml11/xhtml-csismap-1.mod</p>
2629
          </td>
2630
        </tr>
2631
        <tr>
2632
          <td>
2633
            <p>-//W3C//ENTITIES XHTML Datatypes 1.0//EN</p>
2634
          </td>
2635
          <td>
2636
            <p>http://www.w3.org/MarkUp/DTD/xhtml-datatypes-1.mod</p>
2637
          </td>
2638
          <td>
2639
            <p>w3c/xhtml11/xhtml-datatypes-1.mod</p>
2640
          </td>
2641
        </tr>
2642
        <tr>
2643
          <td>
2644
            <p>-//W3C//ELEMENTS XHTML Editing Markup 1.0//EN</p>
2645
          </td>
2646
          <td>
2647
            <p>http://www.w3.org/MarkUp/DTD/xhtml-edit-1.mod</p>
2648
          </td>
2649
          <td>
2650
            <p>w3c/xhtml11/xhtml-edit-1.mod</p>
2651
          </td>
2652
        </tr>
2653
        <tr>
2654
          <td>
2655
            <p>-//W3C//ENTITIES XHTML Intrinsic Events 1.0//EN</p>
2656
          </td>
2657
          <td>
2658
            <p>http://www.w3.org/MarkUp/DTD/xhtml-events-1.mod</p>
2659
          </td>
2660
          <td>
2661
            <p>w3c/xhtml11/xhtml-events-1.mod</p>
2662
          </td>
2663
        </tr>
2664
        <tr>
2665
          <td>
2666
            <p>-//W3C//ELEMENTS XHTML Forms 1.0//EN</p>
2667
          </td>
2668
          <td>
2669
            <p>http://www.w3.org/MarkUp/DTD/xhtml-form-1.mod</p>
2670
          </td>
2671
          <td>
2672
            <p>w3c/xhtml11/xhtml-form-1.mod</p>
2673
          </td>
2674
        </tr>
2675
        <tr>
2676
          <td>
2677
            <p>-//W3C//ELEMENTS XHTML Frames 1.0//EN</p>
2678
          </td>
2679
          <td>
2680
            <p>http://www.w3.org/MarkUp/DTD/xhtml-frames-1.mod</p>
2681
          </td>
2682
          <td>
2683
            <p>w3c/xhtml11/xhtml-frames-1.mod</p>
2684
          </td>
2685
        </tr>
2686
        <tr>
2687
          <td>
2688
            <p>-//W3C//ENTITIES XHTML Modular Framework 1.0//EN</p>
2689
          </td>
2690
          <td>
2691
            <p>http://www.w3.org/MarkUp/DTD/xhtml-framework-1.mod</p>
2692
          </td>
2693
          <td>
2694
            <p>w3c/xhtml11/xhtml-framework-1.mod</p>
2695
          </td>
2696
        </tr>
2697
        <tr>
2698
          <td>
2699
            <p>-//W3C//ENTITIES XHTML HyperAttributes 1.0//EN</p>
2700
          </td>
2701
          <td>
2702
            <p>http://www.w3.org/MarkUp/DTD/xhtml-hyperAttributes-1.mod</p>
2703
          </td>
2704
          <td>
2705
            <p>w3c/xhtml11/xhtml-hyperAttributes-1.mod</p>
2706
          </td>
2707
        </tr>
2708
        <tr>
2709
          <td>
2710
            <p>-//W3C//ELEMENTS XHTML Hypertext 1.0//EN</p>
2711
          </td>
2712
          <td>
2713
            <p>http://www.w3.org/MarkUp/DTD/xhtml-hypertext-1.mod</p>
2714
          </td>
2715
          <td>
2716
            <p>w3c/xhtml11/xhtml-hypertext-1.mod</p>
2717
          </td>
2718
        </tr>
2719
        <tr>
2720
          <td>
2721
            <p>-//W3C//ELEMENTS XHTML Inline Frame Element 1.0//EN</p>
2722
          </td>
2723
          <td>
2724
            <p>http://www.w3.org/MarkUp/DTD/xhtml-iframe-1.mod</p>
2725
          </td>
2726
          <td>
2727
            <p>w3c/xhtml11/xhtml-iframe-1.mod</p>
2728
          </td>
2729
        </tr>
2730
        <tr>
2731
          <td>
2732
            <p>-//W3C//ELEMENTS XHTML Images 1.0//EN</p>
2733
          </td>
2734
          <td>
2735
            <p>http://www.w3.org/MarkUp/DTD/xhtml-image-1.mod</p>
2736
          </td>
2737
          <td>
2738
            <p>w3c/xhtml11/xhtml-image-1.mod</p>
2739
          </td>
2740
        </tr>
2741
        <tr>
2742
          <td>
2743
            <p>-//W3C//ELEMENTS XHTML Inline Phrasal 1.0//EN</p>
2744
          </td>
2745
          <td>
2746
            <p>http://www.w3.org/MarkUp/DTD/xhtml-inlphras-1.mod</p>
2747
          </td>
2748
          <td>
2749
            <p>w3c/xhtml11/xhtml-inlphras-1.mod</p>
2750
          </td>
2751
        </tr>
2752
        <tr>
2753
          <td>
2754
            <p>-//W3C//ELEMENTS XHTML Inline Presentation 1.0//EN</p>
2755
          </td>
2756
          <td>
2757
            <p>http://www.w3.org/MarkUp/DTD/xhtml-inlpres-1.mod</p>
2758
          </td>
2759
          <td>
2760
            <p>xhtml11/xhtml-inlpres-1.mod</p>
2761
          </td>
2762
        </tr>
2763
        <tr>
2764
          <td>
2765
            <p>-//W3C//ELEMENTS XHTML Inline Structural 1.0//EN</p>
2766
          </td>
2767
          <td>
2768
            <p>http://www.w3.org/MarkUp/DTD/xhtml-inlstruct-1.mod</p>
2769
          </td>
2770
          <td>
2771
            <p>w3c/xhtml11/xhtml-inlstruct-1.mod</p>
2772
          </td>
2773
        </tr>
2774
        <tr>
2775
          <td>
2776
            <p>-//W3C//ENTITIES XHTML Inline Style 1.0//EN</p>
2777
          </td>
2778
          <td>
2779
            <p>http://www.w3.org/MarkUp/DTD/xhtml-inlstyle-1.mod</p>
2780
          </td>
2781
          <td>
2782
            <p>w3c/xhtml11/xhtml-inlstyle-1.mod</p>
2783
          </td>
2784
        </tr>
2785
        <tr>
2786
          <td>
2787
            <p>-//W3C//ELEMENTS XHTML Inputmode 1.0//EN</p>
2788
          </td>
2789
          <td>
2790
            <p>http://www.w3.org/MarkUp/DTD/xhtml-inputmode-1.mod</p>
2791
          </td>
2792
          <td>
2793
            <p>w3c/xhtml11/xhtml-inputmode-1.mod</p>
2794
          </td>
2795
        </tr>
2796
        <tr>
2797
          <td>
2798
            <p>-//W3C//ELEMENTS XHTML Legacy Markup 1.0//EN</p>
2799
          </td>
2800
          <td>
2801
            <p>http://www.w3.org/MarkUp/DTD/xhtml-legacy-1.mod</p>
2802
          </td>
2803
          <td>
2804
            <p>w3c/xhtml11/xhtml-legacy-1.mod</p>
2805
          </td>
2806
        </tr>
2807
        <tr>
2808
          <td>
2809
            <p>-//W3C//ELEMENTS XHTML Legacy Redeclarations 1.0//EN</p>
2810
          </td>
2811
          <td>
2812
            <p>http://www.w3.org/MarkUp/DTD/xhtml-legacy-redecl-1.mod</p>
2813
          </td>
2814
          <td>
2815
            <p>w3c/xhtml11/xhtml-legacy-redecl-1.mod</p>
2816
          </td>
2817
        </tr>
2818
        <tr>
2819
          <td>
2820
            <p>-//W3C//ELEMENTS XHTML Link Element 1.0//EN</p>
2821
          </td>
2822
          <td>
2823
            <p>http://www.w3.org/MarkUp/DTD/xhtml-link-1.mod</p>
2824
          </td>
2825
          <td>
2826
            <p>w3c/xhtml11/xhtml-link-1.mod</p>
2827
          </td>
2828
        </tr>
2829
        <tr>
2830
          <td>
2831
            <p>-//W3C//ELEMENTS XHTML Lists 1.0//EN</p>
2832
          </td>
2833
          <td>
2834
            <p>http://www.w3.org/MarkUp/DTD/xhtml-list-1.mod</p>
2835
          </td>
2836
          <td>
2837
            <p>w3c/xhtml11/xhtml-list-1.mod</p>
2838
          </td>
2839
        </tr>
2840
        <tr>
2841
          <td>
2842
            <p>-//W3C//ELEMENTS XHTML Metainformation 1.0//EN</p>
2843
          </td>
2844
          <td>
2845
            <p>http://www.w3.org/MarkUp/DTD/xhtml-meta-1.mod</p>
2846
          </td>
2847
          <td>
2848
            <p>w3c/xhtml11/xhtml-meta-1.mod</p>
2849
          </td>
2850
        </tr>
2851
        <tr>
2852
          <td>
2853
            <p>-//W3C//ELEMENTS XHTML Metainformation 2.0//EN</p>
2854
          </td>
2855
          <td>
2856
            <p>http://www.w3.org/MarkUp/DTD/xhtml-meta-2.mod</p>
2857
          </td>
2858
          <td>
2859
            <p>w3c/xhtml11/xhtml-meta-2.mod</p>
2860
          </td>
2861
        </tr>
2862
        <tr>
2863
          <td>
2864
            <p>-//W3C//ENTITIES XHTML MetaAttributes 1.0//EN</p>
2865
          </td>
2866
          <td>
2867
            <p>http://www.w3.org/MarkUp/DTD/xhtml-metaAttributes-1.mod</p>
2868
          </td>
2869
          <td>
2870
            <p>w3c/xhtml11/xhtml-metaAttributes-1.mod</p>
2871
          </td>
2872
        </tr>
2873
        <tr>
2874
          <td>
2875
            <p>-//W3C//ELEMENTS XHTML Name Identifier 1.0//EN</p>
2876
          </td>
2877
          <td>
2878
            <p>http://www.w3.org/MarkUp/DTD/xhtml-nameident-1.mod</p>
2879
          </td>
2880
          <td>
2881
            <p>w3c/xhtml11/xhtml-nameident-1.mod</p>
2882
          </td>
2883
        </tr>
2884
        <tr>
2885
          <td>
2886
            <p>-//W3C//NOTATIONS XHTML Notations 1.0//EN</p>
2887
          </td>
2888
          <td>
2889
            <p>http://www.w3.org/MarkUp/DTD/xhtml-notations-1.mod</p>
2890
          </td>
2891
          <td>
2892
            <p>w3c/xhtml11/xhtml-notations-1.mod</p>
2893
          </td>
2894
        </tr>
2895
        <tr>
2896
          <td>
2897
            <p>-//W3C//ELEMENTS XHTML Embedded Object 1.0//EN</p>
2898
          </td>
2899
          <td>
2900
            <p>http://www.w3.org/MarkUp/DTD/xhtml-object-1.mod</p>
2901
          </td>
2902
          <td>
2903
            <p>w3c/xhtml11/xhtml-object-1.mod</p>
2904
          </td>
2905
        </tr>
2906
        <tr>
2907
          <td>
2908
            <p>-//W3C//ELEMENTS XHTML Param Element 1.0//EN</p>
2909
          </td>
2910
          <td>
2911
            <p>http://www.w3.org/MarkUp/DTD/xhtml-param-1.mod</p>
2912
          </td>
2913
          <td>
2914
            <p>w3c/xhtml11/xhtml-param-1.mod</p>
2915
          </td>
2916
        </tr>
2917
        <tr>
2918
          <td>
2919
            <p>-//W3C//ELEMENTS XHTML Presentation 1.0//EN</p>
2920
          </td>
2921
          <td>
2922
            <p>http://www.w3.org/MarkUp/DTD/xhtml-pres-1.mod</p>
2923
          </td>
2924
          <td>
2925
            <p>w3c/xhtml11/xhtml-pres-1.mod</p>
2926
          </td>
2927
        </tr>
2928
        <tr>
2929
          <td>
2930
            <p>-//W3C//ENTITIES XHTML-Print 1.0 Document Model 1.0//EN</p>
2931
          </td>
2932
          <td>
2933
            <p>http://www.w3.org/MarkUp/DTD/xhtml-print10-model-1.mod</p>
2934
          </td>
2935
          <td>
2936
            <p>w3c/xhtml11/xhtml-print10-model-1.mod</p>
2937
          </td>
2938
        </tr>
2939
        <tr>
2940
          <td>
2941
            <p>-//W3C//ENTITIES XHTML Qualified Names 1.0//EN</p>
2942
          </td>
2943
          <td>
2944
            <p>http://www.w3.org/MarkUp/DTD/xhtml-qname-1.mod</p>
2945
          </td>
2946
          <td>
2947
            <p>w3c/xhtml11/xhtml-qname-1.mod</p>
2948
          </td>
2949
        </tr>
2950
        <tr>
2951
          <td>
2952
            <p>-//W3C//ENTITIES XHTML+RDFa Document Model 1.0//EN</p>
2953
          </td>
2954
          <td>
2955
            <p>http://www.w3.org/MarkUp/DTD/xhtml-rdfa-model-1.mod</p>
2956
          </td>
2957
          <td>
2958
            <p>w3c/xhtml11/xhtml-rdfa-model-1.mod</p>
2959
          </td>
2960
        </tr>
2961
        <tr>
2962
          <td>
2963
            <p>-//W3C//ENTITIES XHTML RDFa Attribute Qnames 1.0//EN</p>
2964
          </td>
2965
          <td>
2966
            <p>http://www.w3.org/MarkUp/DTD/xhtml-rdfa-qname-1.mod</p>
2967
          </td>
2968
          <td>
2969
            <p>w3c/xhtml11/xhtml-rdfa-qname-1.mod</p>
2970
          </td>
2971
        </tr>
2972
        <tr>
2973
          <td>
2974
            <p>-//W3C//ENTITIES XHTML Role Attribute 1.0//EN</p>
2975
          </td>
2976
          <td>
2977
            <p>http://www.w3.org/MarkUp/DTD/xhtml-role-1.mod</p>
2978
          </td>
2979
          <td>
2980
            <p>w3c/xhtml11/xhtml-role-1.mod</p>
2981
          </td>
2982
        </tr>
2983
        <tr>
2984
          <td>
2985
            <p>-//W3C//ENTITIES XHTML Role Attribute Qnames 1.0//EN</p>
2986
          </td>
2987
          <td>
2988
            <p>http://www.w3.org/MarkUp/DTD/xhtml-role-qname-1.mod</p>
2989
          </td>
2990
          <td>
2991
            <p>w3c/xhtml11/xhtml-role-qname-1.mod</p>
2992
          </td>
2993
        </tr>
2994
        <tr>
2995
          <td>
2996
            <p>-//W3C//ELEMENTS XHTML Ruby 1.0//EN</p>
2997
          </td>
2998
          <td>
2999
            <p>http://www.w3.org/TR/ruby/xhtml-ruby-1.mod</p>
3000
          </td>
3001
          <td>
3002
            <p>w3c/xhtml11/xhtml-ruby-1.mod</p>
3003
          </td>
3004
        </tr>
3005
        <tr>
3006
          <td>
3007
            <p>-//W3C//ELEMENTS XHTML Scripting 1.0//EN</p>
3008
          </td>
3009
          <td>
3010
            <p>http://www.w3.org/MarkUp/DTD/xhtml-script-1.mod</p>
3011
          </td>
3012
          <td>
3013
            <p>w3c/xhtml11/xhtml-script-1.mod</p>
3014
          </td>
3015
        </tr>
3016
        <tr>
3017
          <td>
3018
            <p>-//W3C//ELEMENTS XHTML Server-side Image Maps 1.0//EN</p>
3019
          </td>
3020
          <td>
3021
            <p>http://www.w3.org/MarkUp/DTD/xhtml-ssismap-1.mod</p>
3022
          </td>
3023
          <td>
3024
            <p>w3c/xhtml11/xhtml-ssismap-1.mod</p>
3025
          </td>
3026
        </tr>
3027
        <tr>
3028
          <td>
3029
            <p>-//W3C//ELEMENTS XHTML Document Structure 1.0//EN</p>
3030
          </td>
3031
          <td>
3032
            <p>http://www.w3.org/MarkUp/DTD/xhtml-struct-1.mod</p>
3033
          </td>
3034
          <td>
3035
            <p>w3c/xhtml11/xhtml-struct-1.mod</p>
3036
          </td>
3037
        </tr>
3038
        <tr>
3039
          <td>
3040
            <p>-//W3C//DTD XHTML Style Sheets 1.0//EN</p>
3041
          </td>
3042
          <td>
3043
            <p>http://www.w3.org/MarkUp/DTD/xhtml-style-1.mod</p>
3044
          </td>
3045
          <td>
3046
            <p>w3c/xhtml11/xhtml-style-1.mod</p>
3047
          </td>
3048
        </tr>
3049
        <tr>
3050
          <td>
3051
            <p>-//W3C//ELEMENTS XHTML Tables 1.0//EN</p>
3052
          </td>
3053
          <td>
3054
            <p>http://www.w3.org/MarkUp/DTD/xhtml-table-1.mod</p>
3055
          </td>
3056
          <td>
3057
            <p>w3c/xhtml11/xhtml-table-1.mod</p>
3058
          </td>
3059
        </tr>
3060
        <tr>
3061
          <td>
3062
            <p>-//W3C//ELEMENTS XHTML Target 1.0//EN</p>
3063
          </td>
3064
          <td>
3065
            <p>http://www.w3.org/MarkUp/DTD/xhtml-target-1.mod</p>
3066
          </td>
3067
          <td>
3068
            <p>w3c/xhtml11/xhtml-target-1.mod</p>
3069
          </td>
3070
        </tr>
3071
        <tr>
3072
          <td>
3073
            <p>-//W3C//ELEMENTS XHTML Text 1.0//EN</p>
3074
          </td>
3075
          <td>
3076
            <p>http://www.w3.org/MarkUp/DTD/xhtml-text-1.mod</p>
3077
          </td>
3078
          <td>
3079
            <p>w3c/xhtml11/xhtml-text-1.mod</p>
3080
          </td>
3081
        </tr>
3082
        <tr>
3083
          <td>
3084
            <p>-//W3C//ENTITIES XHTML 1.1 Document Model 1.0//EN</p>
3085
          </td>
3086
          <td>
3087
            <p>http://www.w3.org/MarkUp/DTD/xhtml11-model-1.mod</p>
3088
          </td>
3089
          <td>
3090
            <p>w3c/xhtml11/xhtml11-model-1.mod</p>
3091
          </td>
3092
        </tr>
3093
        <tr>
3094
          <td>
3095
            <p>-//W3C//MathML 1.0//EN</p>
3096
          </td>
3097
          <td>
3098
            <p>http://www.w3.org/Math/DTD/mathml1/mathml.dtd</p>
3099
          </td>
3100
          <td>
3101
            <p>w3c/mathml/mathml1/mathml.dtd</p>
3102
          </td>
3103
        </tr>
3104
        <tr>
3105
          <td>
3106
            <p>-//W3C//DTD MathML 2.0//EN</p>
3107
          </td>
3108
          <td>
3109
            <p>http://www.w3.org/Math/DTD/mathml2/mathml2.dtd</p>
3110
          </td>
3111
          <td>
3112
            <p>w3c/mathml/mathml2/mathml2.dtd</p>
3113
          </td>
3114
        </tr>
3115
        <tr>
3116
          <td>
3117
            <p>-//W3C//DTD MathML 3.0//EN</p>
3118
          </td>
3119
          <td>
3120
            <p>http://www.w3.org/Math/DTD/mathml3/mathml3.dtd</p>
3121
          </td>
3122
          <td>
3123
            <p>w3c/mathml/mathml3/mathml3.dtd</p>
3124
          </td>
3125
        </tr>
3126
        <tr>
3127
          <td>
3128
            <p>-//W3C//DTD SVG 1.0//EN</p>
3129
          </td>
3130
          <td>
3131
            <p>http://www.w3.org/TR/2001/REC-SVG-20010904/DTD/svg10.dtd</p>
3132
          </td>
3133
          <td>
3134
            <p>w3c/svg10/svg10.dtd</p>
3135
          </td>
3136
        </tr>
3137
        <tr>
3138
          <td>
3139
            <p>-//W3C//DTD SVG 1.1//EN</p>
3140
          </td>
3141
          <td>
3142
            <p>http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd</p>
3143
          </td>
3144
          <td>
3145
            <p>w3c/svg11/svg11.dtd</p>
3146
          </td>
3147
        </tr>
3148
        <tr>
3149
          <td>
3150
            <p>-//W3C//DTD SVG 1.1 Tiny//EN</p>
3151
          </td>
3152
          <td>
3153
            <p>http://www.w3.org/Graphics/SVG/1.1/DTD/svg11-tiny.dtd</p>
3154
          </td>
3155
          <td>
3156
            <p>w3c/svg11/svg11-tiny.dtd</p>
3157
          </td>
3158
        </tr>
3159
        <tr>
3160
          <td>
3161
            <p>-//W3C//DTD SVG 1.1 Basic//EN</p>
3162
          </td>
3163
          <td>
3164
            <p>http://www.w3.org/Graphics/SVG/1.1/DTD/svg11-basic.dtd</p>
3165
          </td>
3166
          <td>
3167
            <p>w3c/svg11/svg11-basic.dtd</p>
3168
          </td>
3169
        </tr>
3170
        <tr>
3171
          <td>
3172
            <p>-//XML-DEV//ENTITIES RDDL Document Model 1.0//EN</p>
3173
          </td>
3174
          <td>
3175
            <p>http://www.rddl.org/xhtml-rddl-model-1.mod</p>
3176
          </td>
3177
          <td>
3178
            <p>w3c/rddl/xhtml-rddl-model-1.mod</p>
3179
          </td>
3180
        </tr>
3181
        <tr>
3182
          <td>
3183
            <p>-//XML-DEV//DTD XHTML RDDL 1.0//EN</p>
3184
          </td>
3185
          <td>
3186
            <p>http://www.rddl.org/rddl-xhtml.dtd</p>
3187
          </td>
3188
          <td>
3189
            <p>w3c/rddl/rddl-xhtml.dtd</p>
3190
          </td>
3191
        </tr>
3192
        <tr>
3193
          <td>
3194
            <p>-//XML-DEV//ENTITIES RDDL QName Module 1.0//EN</p>
3195
          </td>
3196
          <td>
3197
            <p>http://www.rddl.org/rddl-qname-1.mod</p>
3198
          </td>
3199
          <td>
3200
            <p>rddl/rddl-qname-1.mod</p>
3201
          </td>
3202
        </tr>
3203
        <tr>
3204
          <td>
3205
            <p>-//XML-DEV//ENTITIES RDDL Resource Module 1.0//EN</p>
3206
          </td>
3207
          <td>
3208
            <p>http://www.rddl.org/rddl-resource-1.mod</p>
3209
          </td>
3210
          <td>
3211
            <p>rddl/rddl-resource-1.mod</p>
3212
          </td>
3213
        </tr>
3214
        <tr>
3215
          <td>
3216
            <p>-//W3C//DTD Specification V2.10//EN</p>
3217
          </td>
3218
          <td>
3219
            <p>http://www.w3.org/2002/xmlspec/dtd/2.10/xmlspec.dtd</p>
3220
          </td>
3221
          <td>
3222
            <p>w3c/xmlspec/xmlspec.dtd</p>
3223
          </td>
3224
        </tr>
3225
        <tr>
3226
          <td>
3227
            <p>-//W3C//DTD XMLSCHEMA 200102//EN</p>
3228
          </td>
3229
          <td>
3230
            <p>http://www.w3.org/2001/XMLSchema.dtd</p>
3231
          </td>
3232
          <td>
3233
            <p>w3c/xmlschema/XMLSchema.dtd</p>
3234
          </td>
3235
        </tr>
3236

    
3237

    
3238
      </tbody>
3239
    </table>
3240

    
3241
    <p>This Saxon feature can be disabled by setting the configuration property <a
3242
        class="bodylink code" href="/configuration/config-features"
3243
        >Feature.ENTITY_RESOLVER_CLASS</a> to null; it is also possible to set it to a different
3244
        <code>EntityResolver</code> class (perhaps a subclass of Saxon's
3245
        <code>StandardEntityResolver</code>) that varies the behavior. If an
3246
        <code>EntityResolver</code> is set in the relevant <code>ParseOptions</code> or in an
3247
        <code>AugmentedSource</code> then this will override any <code>EntityResolver</code> set at
3248
      the configuration level.</p>
3249
  </section>
3250

    
3251
</article>
(16-16/19)