Project

Profile

Help

How to connect?
Download (157 KB) Statistics
| Branch: | Revision:

he / src / userdoc / sourcedocs.xml @ 51cfbdf3

1
<?xml version="1.0" encoding="utf-8"?>
2
<article id="sourcedocs" title="Handling XML Documents">
3
  <h1>Handling XML Documents</h1>
4

    
5

    
6
  <p>This section discusses the various options in Saxon for handling XML documents.
7
    These might form the input or output of a query or stylesheet, or they might be
8
  used directly by application code written (say) in Java.</p>
9

    
10
  <p>See the topics below for further information:</p>
11

    
12
  <nav>
13
    <ul/>
14
  </nav>
15

    
16
  <section id="command-line" title="Source Documents on the Command Line">
17
    <h1>Source Documents on the Command Line</h1>
18

    
19

    
20
    <p>When Saxon (either XSLT or XQuery) is invoked from the command line, the source document will
21
      normally be an XML 1.0 document. Supplying an XML 1.1 document will also work, provided that
22
      (a) the selected parser is an XML 1.1 parser, and (b) the command line option
23
        <code>-xmlversion:1.1</code> is set.</p>
24

    
25
    <p>If a custom parser is specified using the <code>-x</code> option on the command line, then
26
      the source document can be in any format accepted by this custom parser. The only constraint
27
      is that the parser must behave as a SAX2 parser, delivering a stream of events that define a
28
      virtual XML document. For example, the TagSoup parser from John Cowan can be used to feed an
29
      HTML document as input to Saxon.</p>
30

    
31
    <p>Non-standard input formats can also be handled by specifying a user-written
32
        <code>URIResolver</code>. If the <code>-u</code> option is used on the command line, or if
33
      the source file name begins with <code>http:</code> or <code>https:</code> or
34
        <code>file:</code> or <code>classpath:</code>, then the source file name is resolved to a
35
      JAXP Source object using the <code>URIResolver</code>; if a user-written
36
        <code>URIResolver</code> is nominated (using the <code>-r</code> option) then this may
37
      translate the file name into a <code>Source</code> object any way that it wishes.</p>
38

    
39
    <aside>Saxon (from 9.7) supports the <code>classpath</code> URI scheme to locate resources
40
      using the Java classpath. This URI scheme is defined by the Spring framework, but Saxon's
41
      implementation is free-standing. For example, <code>classpath:utility.xsl</code> will locate
42
      a file called <code>utility.xsl</code> as a resource on the classpath.</aside>
43
    <aside>Saxon (from 9.9) also supports the <code>data</code> URI scheme, which allows
44
      a small resource to be contained within the URI itself, suitably encoded.</aside>
45
  </section>
46
  <section id="collections" title="Collections">
47
    <h1>Collections</h1>
48

    
49
    <p>Saxon implements the <a class="bodylink code" href="/functions/fn/collection"
50
        >collection()</a> and <a class="bodylink code" href="/functions/fn/uri-collection"
51
        >uri-collection()</a> functions by passing the given collection URI (or null, if the default
52
      collection is requested) to a user-provided <a class="javalink"
53
        href="net.sf.saxon.lib.CollectionFinder">CollectionFinder</a>. This section describes how
54
      the standard (default) collection finder behaves, if no user-written collection finder is
55
      supplied. (For information on supplying a user-written <code>CollectionFinder</code>, see <a
56
        class="bodylink" href="user-collections">Writing your own Collection Finder</a>.)</p>
57
    
58
    <p>In XSLT 3.0 and XQuery 3.1, collections can contain resources other than XML documents: for
59
    example, JSON documents, plain text documents, and binary files.</p>
60

    
61
    <p>The default collection can be registered with the <code>Configuration</code> in the form of a
62
      collection URI. When the <code>collection()</code> function is called with no arguments, this
63
      is exactly the same as supplying this default collection URI. If no default collection URI has
64
      been registered, an empty collection is returned.</p>
65

    
66
    <p>The standard collection finder supports four different kinds of collection: registered collections,
67
      catalog-based collections, directory-based collections, and zip-based collections:</p>
68
    
69
    <ul>
70
      <li><p>A registered collection is one that has been explicitly registered with the Configuration, by calling
71
      <code>Configuration.registerCollection()</code>.</p></li>
72
      <li><p>If the collection URI
73
        corresponds to a directory name, then a directory-based collection is used: the collection contains
74
      selected files from the named directory.</p></li>
75
      <li><p>If the collection URI identifies a
76
        ZIP or JAR file (more specifically, if it uses the <code>jar</code> URI scheme, or has a file extension of
77
        ".zip" or ".jar") then a zip-based collection is used.</p></li>
78
      <li><p>Otherwise, the collection URI must be
79
        the URI of an XML file which acts as a catalog, that is, it contains a list of the resources
80
        in the collection.</p></li>
81
    </ul>
82

    
83

    
84
    <aside>
85
      <p>To recognize additional kinds of ZIP file, for example Open Office documents, set the
86
      configuration property <code>ZIP_URI_PATTERN</code>. The value is a regular
87
        expression, for example you could set it to <code>"\.(zip|jar|docx)$"</code> to recognize
88
        URIs with file extensions ".zip", ".jar", or ".docx".</p>
89
    </aside>
90

    
91

    
92
    <p>Saxon by default recognizes four kids of resource: XML documents,
93
      JSON documents, unparsed text documents, and binary files. The standard collection resolver
94
      attempts to identify which kind of resource to use based on the content type (media type),
95
      which in turn may be inferred from HTTP headers, from sniffing the initial bytes of the
96
      content, or from file extensions.</p>
97

    
98
    <p>In the case of directory-based and ZIP-based collections, query parameters may be added to
99
      the collection URI to further control how it is to be processed.</p>
100
    
101
    <aside><p>Saxon cannot assume that the nodes returned by the <code>collection()</code> function
102
    are in document order. It is therefore best to avoid expressions like <code>collection()/doc/section</code>
103
    which force the collection to be sorted (and therefore force all the nodes in the collection to
104
    be in memory at the same time). To iterate over a collection, it's better to use constructs that
105
    don't sort into document order: for example <code>collection() ! doc/section</code>,
106
    or <code>xsl:for-each</code>, or <code>for $x in collection() return ...</code>.</p>
107
    
108
      <p>See also <a class="bodylink code"
109
        href="/functions/saxon/discard-document">saxon:discard-document()</a>.</p></aside>
110

    
111
    <h2 class="subtitle">Defining a collection using a catalog file</h2>
112

    
113
    <p>If the collection URI identifies a file, Saxon treats this as a catalog file. This is a file
114
      in XML format that lists the documents comprising the collection. Here is an example of such a
115
      catalog file:</p>
116
    <samp><![CDATA[<collection stable="true">
117
  <doc href="dir/chap1.xml"/>
118
  <doc href="dir/chap2.xml"/>
119
  <doc href="dir/chap3.xml"/>
120
  <doc href="dir/chap4.xml"/>
121
</collection>]]></samp>
122

    
123
    <p>The <code>stable</code> attribute indicates whether the collection is stable or not. The
124
      default value is <code>true</code>. If a collection is stable, then the URIs listed in the
125
        <code>doc</code> elements are treated like URIs passed to the <code>doc()</code> function.
126
      Each URI is first looked up in the document pool to see if it is already loaded; if it is,
127
      then the document node is returned. Otherwise the URI is passed to the registered
128
        <code>URIResolver</code>, and the resulting document is added to the document pool. The
129
      effect of this process is firstly, that two calls on the <code>collection()</code> function
130
      passing the same collection URI will return the same nodes each time, and secondly, that these
131
      results are consistent with the results of the <code>doc()</code> function: if the
132
        <code>document-uri()</code> of a node returned by the <code>collection()</code> function is
133
      passed to the <code>doc()</code> function, the original node will be returned. If
134
        <code>stable="false"</code> is specified, however, the URI is dereferenced directly, and the
135
      document is not added to the document pool, which means that a subsequent retrieval of the
136
      same document will not return the same node.</p>
137

    
138
    <h2 class="subtitle">Processing directories</h2>
139

    
140
    <p>If the URI passed to the <code>collection()</code> function (still assuming a default
141
        <code>CollectionFinder</code>) identifies a directory, then the contents of the
142
      directory are returned. Such a URI may have a number of query parameters, written in the form
143
        <code>file:///a/b/c/d?keyword=value;keyword=value;...</code>. The recognized keywords and
144
      their values are as follows:</p>
145
    <table>
146
      <thead class="params">
147
        <tr>
148
          <td>
149
            <p> keyword </p>
150
          </td>
151
          <td>
152
            <p> values </p>
153
          </td>
154
          <td>
155
            <p> effect </p>
156
          </td>
157
        </tr>
158
      </thead>
159
      <tbody>
160
        <tr>
161
          <td class="keyword">
162
            <p> recurse </p>
163
          </td>
164
          <td>
165
            <p>
166
              <span class="value">yes | no</span> (default <span class="value">no</span>) </p>
167
          </td>
168
          <td>
169
            <p> Determines whether subdirectories are searched recursively. </p>
170
          </td>
171
        </tr>
172
        <tr>
173
          <td class="keyword">
174
            <p> strip-space </p>
175
          </td>
176
          <td>
177
            <p class="value"> yes | ignorable | no </p>
178
          </td>
179
          <td>
180
            <p> Determines whether whitespace text nodes are to be stripped. The default depends on
181
              the <a class="javalink" href="net.sf.saxon.Configuration">Configuration</a> settings.
182
            </p>
183
          </td>
184
        </tr>
185
        <tr>
186
          <td class="keyword">
187
            <p> validation </p>
188
          </td>
189
          <td>
190
            <p class="value"> strip | preserve | lax | strict </p>
191
          </td>
192
          <td>
193
            <p> Determines whether and how schema validation is applied to each document. The
194
              default depends on the <a class="javalink" href="net.sf.saxon.Configuration"
195
                >Configuration</a> settings. </p>
196
          </td>
197
        </tr>
198
        <tr>
199
          <td class="keyword">
200
            <p> select </p>
201
          </td>
202
          <td>
203
            <p> file name pattern ("glob")</p>
204
          </td>
205
          <td>
206
            <p> Determines which files are selected (see below). </p>
207
          </td>
208
        </tr>
209
        <tr>
210
          <td class="keyword">
211
            <p> match </p>
212
          </td>
213
          <td>
214
            <p> regular expression</p>
215
          </td>
216
          <td>
217
            <p> Determines which files are selected (see below). </p>
218
          </td>
219
        </tr>
220
        <tr>
221
          <td class="keyword">
222
            <p> content-type </p>
223
          </td>
224
          <td>
225
            <p> media type (for example <code>application/xml</code> or <code>text/plain</code>)</p>
226
          </td>
227
          <td>
228
            <p> Determines how the resource is processed. For example if the media type is 
229
            <code>application/xml</code> then it will be parsed as XML and returned as a document node;
230
            if it is <code>text/plain</code> then it is returned as an atomic value of type
231
            <code>xs:string</code>; if it is <code>application/binary</code> then it is returned
232
            as an atomic value of type <code>xs:base64Binary</code>.</p>
233
            <p>If this parameter is absent, then the <code
234
              java="net.sf.saxon.lib.CollectionFinder">CollectionFinder</code> attempts to discern the
235
            content type first by looking at the file extension, and then, if necessary, by
236
            examining the initial bytes of the content itself.</p>
237
            <p>The set of content types that are recognized, and their mapping to implementations of the
238
            class <code java="net.sf.saxon.lib.ResourceFactory">ResourceFactory</code>, is defined in the 
239
            <code java="net.sf.saxon.Configuration">Configuration</code>, and can be changed using the
240
            method <code>Configuration.registerMediaType()</code>. The set of file extensions that are
241
              recognized, and their mapping to media types, is also held in the <code>Configuration</code>, and can be changed using the
242
              method <code>Configuration.registerFileExtension()</code>.</p>
243
            <p>Available from Saxon 10.1.</p>
244
          </td>
245
        </tr>
246
        <tr>
247
          <td class="keyword">
248
            <p> metadata </p>
249
          </td>
250
          <td>
251
            <p class="value"> yes | no</p>
252
          </td>
253
          <td>
254
            <p> If set to yes, the item returned by the <code>collection()</code> function will be a
255
              map containing properties of the selected resource as well as its content. The keys of
256
              the map will be strings. Two entries with names "name" and "fetch" will always be
257
              available.</p>
258
            <p>The value of the "fetch" entry is a function that can be called to retrieve the
259
              content (it returns the same item that would have been returned with the default
260
              setting of <code>metadata=no</code>: for example a node representing an XML document,
261
              or a map representing the content of a JSON file). This allows you to decide which
262
              items in the collection to fetch based on their properties, for example:</p>
263

    
264
            <p>
265
              <code>for $m in collection('/data/folder?metadata=yes') return if
266
                ($m?content-type='application/xml') then $m?fetch() else ()</code>
267
            </p>
268

    
269
            <p>Failures in parsing a resource can be trapped by using try/catch around the call on
270
              the <code>fetch</code> function.</p>
271
            <p>Other entries in the returned map represent properties of the file obtained from the
272
              operating system: for example <code>last-modified</code>, <code>can-execute</code>,
273
                <code>length</code>, or <code>is-hidden</code>.</p>
274
          </td>
275
        </tr>
276
        <tr>
277
          <td class="keyword">
278
            <p> on-error </p>
279
          </td>
280
          <td>
281
            <p class="value"> fail | warning | ignore </p>
282
          </td>
283
          <td>
284
            <p> Determines the action to be taken if one of the files cannot be successfully parsed.
285
            </p>
286
          </td>
287
        </tr>
288
        <tr>
289
          <td class="keyword">
290
            <p> parser </p>
291
          </td>
292
          <td>
293
            <p> Java class name </p>
294
          </td>
295
          <td>
296
            <p> Class name of the Java <code>XMLReader</code> to be used. For example, John Cowan's
297
                <code>TagSoup</code> parser may be selected by specifying
298
                <code>parser=org.ccil.cowan.tagsoup.Parser</code> (this parses arbitrary ill-formed
299
              HTML and presents it to Saxon as well-formed XML). </p>
300
          </td>
301
        </tr>
302
        <tr>
303
          <td class="keyword">
304
            <p> xinclude </p>
305
          </td>
306
          <td>
307
            <p class="value"> yes | no </p>
308
          </td>
309
          <td>
310
            <p> Determines whether XInclude processing should be applied to the selected documents.
311
              This overrides any setting in the <a class="javalink"
312
                href="net.sf.saxon.Configuration">Configuration</a> (or any command line option).
313
            </p>
314
          </td>
315
        </tr>
316
        <tr>
317
          <td class="keyword">
318
            <p> stable </p>
319
          </td>
320
          <td>
321
            <p class="value"> yes | no </p>
322
          </td>
323
          <td>
324
            <p> Determines whether the collection is to be stable. </p>
325
          </td>
326
        </tr>
327

    
328
      </tbody>
329
    </table>
330

    
331
    <p>The pattern used in the <code>select</code> parameter can use glob-like syntax, for example
332
        <code>*.xml</code> selects all files with extension "xml". More generally, the pattern is
333
      converted to a regular expression by prepending "<code>^</code>", appending "<code>$</code>",
334
      replacing "<code>.</code>" by "<code>\.</code>", "<code>*</code>" by
335
      "<code>.*</code>", and "<code>?</code>" by
336
      "<code>.?</code>", and it is then used to match the file names appearing in the directory
337
      using the Java regular expression rules. So, for example, you can write
338
        <code>?select=*.(xml|xhtml)</code> to match files with either of these two file extensions.
339
      Note however, that special characters used in the URL (that is, characters such as backslash 
340
      and curly braces that are not allowed in the query part of a URI) must be escaped using 
341
      the %HH convention. For example,
342
      vertical bar needs to be written as <code>%7C</code>. This escaping can be achieved using the
343
        <code>encode-for-uri()</code> function.</p>
344
    
345
    <p>As an alternative to the <code>select</code> parameter, the <code>match</code> parameter
346
    can be used. This accepts a standard XPath 3.1 regular expression as its value. For example,
347
    <code>.+\.xml</code> selects all files with extension "xml". Again, characters that are not allowed
348
    in the query part of a URI, such as backslash, curly braces, and vertical bar, must be escaped
349
    using the %HH convention, which can be achieved using the encode-for-uri() function.</p>
350

    
351
    <p> A collection read in this way is not stable by default. (Stability can be expensive, and is
352
      rarely required, so the default setting is recommended.) Making a collection stable has the
353
      effect that the entire result of the <code>collection()</code> function is retained in a cache
354
      for the duration of the query or transformation, and any further calls on
355
        <code>collection()</code> with the same absolute URI return this saved collection retrieved
356
      from this cache. </p>
357

    
358
    <h2 class="subtitle">Processing ZIP and JAR files</h2>
359

    
360
    <p>If the collection URI identifies a ZIP or JAR file then it is processed in exactly the same
361
      way as a directory. URI query parameters can be used in the same way, and have much the same
362
      effect.</p>
363

    
364
    <p>A URI is recognized as a ZIP or JAR file URI if the scheme name is "jar", or if the file
365
      extension is "zip" or "jar".</p>
366

    
367
    <p>The value of the <code>recurse</code> option is ignored in this case, and
368
        <code>recurse=yes</code> is assumed.</p>
369

    
370
    <p>The option <code>metadata=yes</code> is available for ZIP-based collections as well as for
371
      directory-based collections. The set of properties returned in the resulting map is slightly
372
      different, for example it includes any <code>comment</code> field associated with the ZIP file
373
      entry. Note that no items are returned in respect of directory nodes within the ZIP file; only
374
      leaf nodes are represented.</p>
375
    
376
    <h2 class="subtitle">Registered Collections</h2>
377
    
378
    <p>On the .NET product there is another way to use a collection URI (provided that you use the
379
      API rather than the command line): you can register a collection using the
380
      <code>Processor.RegisterCollection</code> method on the <a class="javalink"
381
        href="Saxon.Api.Processor">Saxon.Api.Processor</a> class.</p>
382
    
383
    <section id="user-collections" title="Writing your own Collection Finder">
384
      <h1>Writing your own Collection Finder</h1>
385
      
386
      <p>Since Saxon 9.7, the <a class="javalink" href="net.sf.saxon.lib.CollectionFinder">CollectionFinder</a>
387
        interface replaces the <code>CollectionURIResolver</code> interface in previous
388
        releases. It has much more flexibility, in particular the ability to deliver non-XML
389
        resources. The old <code>CollectionURIResolver</code> interface has been dropped in Saxon 10.</p>
390
      
391
      <p>Details of the interface can be found in the Javadoc. The basic steps are:</p>
392
      
393
      <ol>
394
        <li>
395
          <p>Write a class that implements <code>CollectionFinder</code>. It takes a single method,
396
            which accepts an absolute collection URI, and returns an object that implements
397
            <code>ResourceCollection</code>. Register an instance of your
398
            <code>CollectionFinder</code> with the Saxon <code>Configuration</code>.</p>
399
          <p>For example, a <code>CollectionFinder</code> written to handle collection URIs using the
400
            scheme name "sql" might be supplied as:</p>
401
          <samp><![CDATA[config.setCollectionFinder((context, uri) -> 
402
   uri.startsWith('sql:') 
403
      ? sqlCollection(uri) 
404
      : config.getStandardCollectionFinder().findCollection(context, uri)
405
)]]></samp>
406
          <p>where <code>sqlCollection(uri)</code> returns some user-defined implementation
407
            of <code>ResourceCollection</code>, perhaps one that retrieves XML documents from
408
            a relational database.</p>
409
        </li>
410
        <li>
411
          <p>You can either reuse the existing implementations of <a class="javalink"
412
            href="net.sf.saxon.lib.ResourceCollection">ResourceCollection</a>, namely
413
            <code>CatalogCollection</code>, <code>DirectoryCollection</code>, and
414
            <code>JarCollection</code>, or you can write your own. You can also of course subclass
415
            the existing collection classes. The <code>ResourceCollection</code> object provides two
416
            key methods that you need to implement: <code>getResources()</code>, which returns a
417
            sequence of <code>Resource</code> objects, and <code>getResourceURIs()</code>, which
418
            returns a sequence of URIs. These are invoked by the <a class="bodylink code"
419
              href="/functions/fn/collection" >fn:collection()</a> and <a class="bodylink code"
420
                href="/functions/fn/uri-collection" >fn:uri-collection()</a> functions respectively.</p>
421
        </li>
422
        <li>
423
          <p>Again, you can either reuse existing implementations of <a class="javalink"
424
            href="net.sf.saxon.lib.Resource">Resource</a> (such as <code>XmlResource</code>,
425
            <code>JSONResource</code>, <code>UnparsedTextResource</code>,
426
            <code>BinaryResource</code>, and <code>MetadataResource</code>), or you can create your
427
            own, perhaps by subclassing. The key method that the <code>Resource</code> object must
428
            provide is <code>getItem()</code> which returns the resource in the form of an XDM item.
429
            It is good practice to delay any extensive work such as parsing until the
430
            <code>getItem()</code> method is called: this reduces the memory footprint, and enables
431
            parallel evaluation of multiple threads (Saxon-EE only).</p>
432
        </li>
433
      </ol>
434
    </section>
435

    
436
  </section>
437
  <section id="builder-api" title="Building a Source Document from lexical XML">
438
    <h1>Building a Source Document from lexical XML</h1>
439

    
440
    <p>The conversion of lexical XML to a tree in memory is called <i>parsing</i>, and is performed
441
    by a software component called an <i>XML Parser</i>. Saxon does not include its own XML parser,
442
    rather it provides interfaces that invoke XML parsers supplied by third parties. Platforms
443
    such as Java and .NET typically include a built-in XML parser that Saxon uses by default.</p>
444

    
445
    <p>With the Java s9api interface, a source document can be built using the <a class="javalink"
446
        href="net.sf.saxon.s9api.DocumentBuilder">DocumentBuilder</a> class, which is created using
447
      the factory method <code>newDocumentBuilder</code> on the <a class="javalink"
448
        href="net.sf.saxon.s9api.Processor">Processor</a> object. Various options for document
449
      building are available as methods on the <code>DocumentBuilder</code>, for example options to
450
      perform schema or DTD validation, to strip whitespace, to expand XInclude directives, and also
451
      to choose the tree implementation model to be used.</p>
452
    
453
    <p>These methods create a document from a <code>Source</code> object. This is a JAXP interface designed
454
    as an abstraction of various kinds of XML source, including <code>StreamSource</code>, which represents lexical XML
455
    held in a file or input stream; <code>SAXSource</code>, which represents a source of SAX events; <code>DOMSource</code>,
456
    representing an already-parsed XML document held in a DOM tree; and <code>StAXSource</code>, which represents a
457
      class that responds to requests for STAX (pull-parser) events. In addition, Saxon's <code
458
        java="net.sf.saxon.om.NodeInfo">NodeInfo</code> and <code
459
          java="net.sf.saxon.om.TreeInfo">TreeInfo</code> classes
460
      implements the JAXP <code>Source</code> interface, and the s9api <a class="javalink"
461
        href="net.sf.saxon.s9api.XdmNode">XdmNode</a> class has an <code>asSource()</code> method,
462
      so it is always possible to supply an existing Saxon tree as
463
    the source for any of these interfaces.</p>
464

    
465
    <p>Similarly in the .NET API, there is a <a class="javalink" href="Saxon.Api.DocumentBuilder"
466
        >DocumentBuilder</a> object that can be created from the <a class="javalink"
467
        href="Saxon.Api.Processor">Processor</a>. This allows options to be set controlling the way
468
      documents are built, and provides an overloaded <code>Build</code> method allowing a tree to
469
      be built from various kinds of source.</p>
470

    
471
    <p>It is also possible to build a Saxon tree in memory by using the <code>buildDocumentTree()</code>
472
      method of the <a class="javalink" href="net.sf.saxon.Configuration">Configuration</a> object.
473
      (When using the JAXP Transformation API, the <code>Configuration</code> can be obtained from
474
      the <code>TransformerFactory</code> as the value of the attribute named <a class="javalink"
475
        href="net.sf.saxon.lib.Feature#CONFIGURATION">Feature.CONFIGURATION.name</a>.)</p>
476

    
477
    <p>The <a class="javalink" href="net.sf.saxon.Configuration#buildDocumentTree">buildDocumentTree()</a>
478
      method takes a single argument, a JAXP <code>Source</code>. This can be any of the standard
479
      kinds of JAXP <code>Source</code>. See <a class="bodylink" href="../jaxpsources">JAXP
480
        Sources</a> for more information. The method returns a <code
481
          java="net.sf.saxon.om.TreeInfo">TreeInfo</code> containing information about the constructed tree,
482
      notably the method <code>getRootNode()</code> to get the root node of the tree,
483
      which in most cases will be a document node.
484
    </p>
485

    
486
    <p>All the documents processed in a single transformation or query must be loaded using the same
487
        <a class="javalink" href="net.sf.saxon.Configuration">Configuration</a>. However, it is
488
      possible to copy a document from one <code>Configuration</code> into another by supplying the
489
        <a class="javalink" href="net.sf.saxon.om.TreeInfo">TreeInfo</a> at the root of the
490
      existing document as the <code>Source</code> supplied to the <code>buildDocumentTree()</code>
491
      method of the new <code>Configuration</code>. </p>
492
  </section>
493
  <section id="building-programmatically" title="Building XML Trees Programmatically">
494
    <h1>Building XML Trees Programmatically</h1>
495
    <p>There are various ways in Saxon to build an XDM tree programmatically 
496
      (that is, incrementally one node at a time).</p>
497
    
498
    <h2 class="subtitle">The Sapling Tree API</h2>
499
    <p>A new API offered from Saxon 10 is the Sapling Tree API. This provides a collection of methods to create
500
    nodes; for example, to create a document containing a <code>body</code> element with two paragraphs, the expression</p>
501
    <samp><![CDATA[doc(
502
  elem("body")
503
    .child(elem("p").text("Hello"), 
504
           elem("p").text("World"))
505
      )]]></samp>
506
    <p>might be used. These methods are found in package <code>net.sf.saxon.sapling</code>, specifically in the
507
      class <code java="net.sf.saxon.sapling.Saplings">net.sf.saxon.sapling.Saplings</code>.</p>
508
    <p>The "Sapling" nodes created by these methods are transient nodes used only during tree construction; when the Sapling
509
    tree has been completely built, it can be converted to a regular XDM tree offering full query access using the methods
510
      <code java="net.sf.saxon.sapling.SaplingDocument#toXdmNode">SaplingDocument.toXdmNode()</code>
511
      or <code  java="net.sf.saxon.sapling.SaplingDocument#toNodeInfo">SaplingDocument.toNodeInfo()</code>. It is also possible to send the tree
512
      directly to a <code java="net.sf.saxon.s9api.Destination">Destination</code> such as a 
513
      <code java="net.sf.saxon.s9api.Serializer">Serializer</code>, a 
514
      <code java="net.sf.saxon.s9api.SchemaValidator">SchemaValidator</code>, or an 
515
      <code java="net.sf.saxon.s9api.Xslt30Transformer">Xslt30Transformer</code>.</p>
516
    
517
    <p>Sapling nodes are immutable objects, so operations like adding children or adding attributes always create a new object,
518
    without modifying the input objects. This means that adding a child element to a new parent can be done without an expensive
519
    copy operation. Nodes do not have references to their parents in the tree, so a subtree can be shared by multiple trees
520
    without copying.</p>
521
    
522
    <p>The Sapling Tree API is described in the JavaDoc for class <code java="net.sf.saxon.sapling.SaplingNode">SaplingNode</code>.</p>
523
    
524
    <h2 class="subtitle">Event APIs</h2>
525
    <p>Saxon 10 introduces a new event-based API (called simply "Push") designed explicitly for convenient use by 
526
      user-written applications.</p>
527
    
528
    <p>A <code>Push</code> instance is always created using the factory method <code>Processor.newPush(destination)</code>;
529
      the <code>destination</code> argument indicates what happens to the constructed document. 
530
      This will commonly be an <code>XdmDestination</code> to build an in-memory <code>XdmNode</code>,
531
      or a <code>Serializer</code> to create lexical XML,
532
      but it could also be, for example, an <code>XsltTransformer</code> or a <code>SchemaValidator</code>.</p>
533
    
534
    <p>Conventional event-based APIs such as the SAX <code>ContentHandler</code> and StAX <code>XMLStreamWriter</code>
535
    and <code>XMLEventWriter</code> rely on the application to issue a properly-nested
536
    sequence of calls to methods such as <code>startElement()</code> and <code>endElement()</code>. This can make
537
      it very difficult to diagnose errors if the calls are not properly matched. The Saxon <code
538
        java="net.sf.saxon.s9api.Push">Push</code> API differs in that
539
    a call to start a new element node returns an <code>Element</code> object representing that element, and methods to create attributes
540
      and children for the element, and to end the element, are defined as methods on that <code>Element</code> object.
541
      Furthermore, these methods return the element to which they are applied, allowing method chaining.
542
    So a typical sequence of calls might be:</p>
543
    
544
    <samp><![CDATA[   out.element("employee")
545
      .attribute("ssn", "123456")
546
      .attribute("location", "Berlin")
547
      .text("Helmut Schmidt")
548
      .close();
549
]]></samp>
550
    
551
    <p>This example constructs a slightly more complex tree:</p>
552
    
553
    <samp><![CDATA[   Processor processor = new Processor(false);
554
   Serializer destination = processor.newSerializer(new File("out.xml"));
555
   destination.setOutputProperty(Serializer.Property.INDENT, "no");
556
   Push.Document doc = processor.newPush(destination).document(true);
557
   doc.setDefaultNamespace("http://www.example.org/ns");
558
   Push.Element top = doc.element("root");
559
   top.attribute("version", "1.5");
560
   for (Employee emp : getData()) {
561
      top.element("emp")
562
         .attribute("ssn", emp.ssn)
563
         .text(emp.name);
564
   }
565
   doc.close(); 
566
]]></samp>
567
    
568
    <p>Note that there are no explicit <code>endElement</code> events here; an end tag is written automatically when
569
    the next sibling is written to the parent element, or when the parent element is closed. The <code>close()</code>
570
    method is available, however, to close an element explicitly, which can be useful to avoid errors when the writing
571
    of elements is distributed across many classes and methods.</p>
572
    
573
    <p>Saxon also allows trees to be communicated using other event-based APIs. In Java there are three such APIs worth considering:</p>
574
    <ul>
575
      <li>Saxon's <code>Receiver</code> API</li>
576
      <li>The SAX <code>ContentHandler</code> API</li>
577
      <li>The StAX <code>XMLStreamWriter</code> API</li>
578
    </ul>
579
    <p>The <code java="net.sf.saxon.event.Receiver">Receiver</code> is efficient, but it is proprietary to Saxon, is prone to minor changes from one release to another,
580
    and is designed primarily for internal use rather than for direct use from applications.</p>
581
    <p>The SAX <code>ContentHandler</code> API was designed primarily for communication from an XML parser to an application; it can be
582
    clumsy to use when the originator of events is something other than an XML parser.</p>
583
    <p>The StAX <code>XMLStreamWriter</code> is probably the best of the three interfaces for most
584
      applications. Saxon's <code java="net.sf.saxon.s9api.DocumentBuilder">DocumentBuilder</code> class
585
      offers a method <code java="net.sf.saxon.s9api.DocumentBuilder#newBuildingStreamWriter">newBuildingStreamWriter()</code> which returns an <code>XMLStreamWriter</code>; the calling application can
586
    then use methods such as <code>XMLStreamWriter.writeStartElement()</code> and <code>XmlStreamWriter.writeEndElement()</code>
587
    to build the tree.</p>
588
    <p>The trickiest part of this interface is probably the handling of namespaces. Saxon's implementation of the StAX interfaces takes
589
    into account not only the official Javadoc specifications (which in some respects are woefully inadequate), but also the unofficial
590
    interpretation of the specifications found at <a
591
      href="http://veithen.github.io/2009/11/01/understanding-stax.html" class="bodylink">Understanding StAX:
592
    How to Correctly Use XMLStreamWriter</a>.</p>
593
  </section>
594
  <section id="preloading" title="Preloading shared reference documents">
595
    <h1>Preloading shared reference documents</h1>
596
    <p>An option is available (<a class="bodylink code" href="/configuration/config-features"
597
        >Feature.PRE_EVALUATE_DOC_FUNCTION</a>) to indicate that calls to the <code>doc()</code>
598
      or <code>document()</code> functions with constant string arguments should be evaluated when a
599
      query or stylesheet is compiled, rather than at run-time. This option is intended for use when
600
      a reference or lookup document is used by all queries and transformations. Using this option
601
      has a number of effects:</p>
602
    <ol>
603
      <li>
604
        <p>The URI is resolved using the compile-time <code>URIResolver</code> rather than the
605
          run-time <code>URIResolver</code>.</p>
606
      </li>
607
      <li>
608
        <p>The document is loaded into a document pool held by the <a class="javalink"
609
            href="net.sf.saxon.Configuration">Configuration</a>, whose memory is released only when
610
          the <code>Configuration</code> itself ceases to exist.</p>
611
      </li>
612
      <li>
613
        <p>All queries and transformations using this document share the same copy.</p>
614
      </li>
615
      <li>
616
        <p>Any updates to the document that occur between compile-time and run-time have no
617
          effect.</p>
618
      </li>
619
    </ol>
620
    <p>The option is selected by using <code>Configuration.setConfigurationProperty()</code> or
621
        <code>TransformerFactory.setAttribute()</code> with the property name
622
        <code>Feature.PRE_EVALUATE_DOC_FUNCTION.name</code>. This option is not available from the
623
      command line because it has no useful effect with a single-shot compile-and-run interface.</p>
624
    <p>This option has no effect if the URI supplied to the <code>doc()</code> or
625
        <code>document()</code> function includes a fragment identifier.</p>
626
    <p>It is also possible to preload a specific document into the shared document pool from the
627
      Java application by using the call <code>config.getGlobalDocumentPool().add(doc, uri)</code>.
628
      When the <code>doc()</code> or <code>document()</code> function is called, the shared document
629
      pool is first checked to see if the requested document is already present. The <a
630
        class="javalink" href="net.sf.saxon.om.DocumentPool">DocumentPool</a> object also has a
631
        <code>discard()</code> method which causes the document to be released from the pool.</p>
632
    
633
    <aside>It is not advisable to use this option when a compiled stylesheet is exported to a SEF
634
    file. Data files are best deployed separately, rather than by embedding them in the SEF.</aside>
635
  </section>
636
  <section id="xml-catalogs" title="Using XML Catalogs">
637
    <h1>Using XML Catalogs</h1>
638

    
639

    
640
    <p>XML Catalogs (<a
641
        href="http://xml.apache.org/commons/components/resolver/resolver-article.html"
642
        class="bodylink">defined by OASIS</a>) provide a way to avoid hard-coding the locations of
643
      XML documents and other resources in your application. Instead, the application refers to the
644
      resource using a conventional system identifier (URI) or public identifier, and a local
645
      catalog is used to map the system and public identifiers to an actual location.</p>
646

    
647
    <p>When using Saxon from the command line, it is possible to specify a catalog to be used using
648
      the option <code>-catalog:<i>files</i></code>. Here <code><i>files</i></code> is the catalog
649
      file to be searched, or a list of filenames separated by semicolons. This catalog will be used
650
      to locate DTDs and external entities required by the XML parser, XSLT stylesheet modules
651
      requested using <code>xsl:import</code> and <code>xsl:include</code>, documents requested
652
      using the <code>document()</code> and <code>doc()</code> functions, and also schema documents,
653
      however they are referenced.</p>
654

    
655
    <p>
656
      <i>The catalog is NOT currently used for non-XML resources, including JSON documents, 
657
        query modules, unparsed text files, collations, and collections.</i>
658
    </p>
659

    
660
    <p>With Saxon on the Java platform, if the <code>-catalog</code> option is used on the command
661
      line, then the open-source Apache library <code>resolver.jar</code> must be present on the
662
      classpath. With Saxon on .NET, this module (cross-compiled to IL) is included within the Saxon
663
      DLL.</p>
664

    
665
    <p>Setting the <code>-catalog</code> option is equivalent to setting the following options:</p>
666

    
667
    <table>
668
      <tr>
669
        <td>
670
          <p>
671
            <code>-r</code>
672
          </p>
673
        </td>
674
        <td>
675
          <p>
676
            <code>org.apache.xml.resolver.tools.CatalogResolver</code>
677
          </p>
678
        </td>
679
      </tr>
680
      <tr>
681
        <td>
682
          <p>
683
            <code>-x</code>
684
          </p>
685
        </td>
686
        <td>
687
          <p>
688
            <code>org.apache.xml.resolver.tools.ResolvingXMLReader</code>
689
          </p>
690
        </td>
691
      </tr>
692
      <tr>
693
        <td>
694
          <p>
695
            <code>-y</code>
696
          </p>
697
        </td>
698
        <td>
699
          <p>
700
            <code>org.apache.xml.resolver.tools.ResolvingXMLReader</code>
701
          </p>
702
        </td>
703
      </tr>
704
    </table>
705

    
706
    <p>In addition, the system property <code>xml.catalog.files</code> is set to the value of the
707
      supplied <code><i>files</i></code> value. And if the <code>-t</code> option is also set, Saxon
708
      sets the verbosity level of the catalog manager to 2, causing it to report messages for each
709
      resolved URI. Saxon customizes the Apache resolver library to integrate these messages with
710
      the other output from the <code>-t</code> option: that is, by default it is sent to the
711
      standard error output.</p>
712

    
713
    <p>
714
      <i>This mechanism means that it is not possible to use any of the options <code>-r</code>,
715
          <code>-x</code>, or <code>-y</code> when the <code>-catalog</code> option is used.</i>
716
    </p>
717

    
718
    <p>When the <code>-catalog</code> option is used on the command line, this overrides the
719
      internal resolver used in Saxon (from 9.4) to redirect well-known W3C references (such as the
720
      XHTML DTD) to Saxon's local copies of these resources. Because both these features rely on
721
      setting the XML parser's <code>EntityResolver</code>, it is not possible to use them in
722
      conjunction.</p>
723

    
724
    <p>This support for OASIS catalogs is implemented only in the Saxon command line. To use
725
      catalogs from a Saxon application, it is necessary to configure the various options
726
      individually. For example:</p>
727

    
728
    <ul>
729
      <li>
730
        <p>To use catalogs to resolve references to DTDs and external entities, choose
731
            <code>ResolvingXMLReader</code> as your XML parser, or set
732
            <code>org.apache.xml.resolver.tools.CatalogResolver</code> as the
733
            <code>EntityResolver</code> used by your chosen XML parser.</p>
734
      </li>
735

    
736
      <li>
737
        <p>To use catalogs to resolve <code>xsl:include</code> and <code>xsl:import</code>
738
          references, choose <code>org.apache.xml.resolver.tools.CatalogResolver</code> as the
739
            <code>URIResolver</code> used by Saxon when compiling the stylesheet.</p>
740
      </li>
741

    
742
      <li>
743
        <p>To use catalogs to resolve calls on <code>doc()</code> or <code>document()</code>
744
          references, choose <code>org.apache.xml.resolver.tools.CatalogResolver</code> as the
745
            <code>URIResolver</code> used by Saxon when running the stylesheet (for example, using
746
            <code>Transformer.setURIResolver()</code>).</p>
747
      </li>
748
    </ul>
749

    
750
    <p>Here is an example of a very simple catalog file. The <code>publicId</code> and
751
        <code>systemId</code> attributes give the public or system identifier as used in the source
752
      document; the <code>uri</code> attribute gives the location (in this case a relative location)
753
      where the actual resource will be found.</p>
754

    
755

    
756

    
757
    <samp><![CDATA[<?xml version="1.0"?>
758
<catalog  xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">  
759
   <group  prefer="public"  xml:base="file:///usr/share/xml/" >  
760

    
761
      <public 
762
         publicId="-//OASIS//DTD DocBook XML V4.5//EN"  
763
         uri="docbook45/docbookx.dtd"/>
764

    
765
      <system
766
         systemId="http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd"  
767
         uri="docbook45/docbookx.dtd"/>
768

    
769
   </group>
770
</catalog>]]></samp>
771

    
772
    <p>There are many tutorials for XML catalogs available on the web, including some that have
773
      information specific to Saxon, though this may well relate to earlier releases.</p>
774
  </section>
775
  <section id="input-filters" title="Writing input filters">
776
    <h1>Writing input filters</h1>
777

    
778

    
779
    <p>Saxon can take its input from a JAXP <code>SAXSource</code> object, which essentially
780
      represents a sequence of SAX events representing the output of an XML parser. A very useful
781
      technique is to interpose a <i>filter</i> between the parser and Saxon. The filter will
782
      typically be an instance of the SAX2 <strong>XMLFilter</strong> class. </p>
783

    
784
    <p>There are a number of ways of using a Saxon XSLT transformation as part of a pipeline of
785
      filters. Some of these techniques also work with XQuery. The techniques include:</p>
786
    <ul>
787
      <li>
788
        <p>Generate the transformation as an <code>XMLFilter</code> using the
789
            <code>newXMLFilter()</code> method of the <code>TransformerFactory</code>. This works
790
          with XSLT only. A drawback of this approach is that it is not possible to supply
791
          parameters to the transformation using standard JAXP facilities. It is possible, however,
792
          by casting the <code>XMLFilter</code> to a <a class="javalink" href="net.sf.saxon.jaxp.FilterImpl"
793
            >net.sf.saxon.jaxp.FilterImpl</a>, and calling its <code>getTransformer()</code> method, which
794
          returns a <code>Transformer</code> object offering the usual <code>addParameter()</code>
795
          method.</p>
796
      </li>
797
      <li>
798
        <p>Generate the transformation as a SAX <code>ContentHandler</code> using the
799
            <code>newTransformerHandler()</code> method. The pipeline stages after the
800
          transformation can be added by giving the transformation a <code>SAXResult</code> as its
801
          destination. This again is XSLT only.</p>
802
      </li>
803
      <li>
804
        <p>Implement the pipeline step before the transformation or query as an
805
            <code>XMLFilter</code>, and use this as the <code>XMLReader</code> part of a
806
            <code>SAXSource</code>, pretending to be an XML parser. This technique works with both
807
          XSLT and XQuery, and it can even be used from the command line, by nominating the
808
            <code>XMLFilter</code> as the source parser using the <code>-x</code> option on the
809
          command line.</p>
810
      </li>
811
    </ul>
812

    
813
    <p>The <code>-x</code> option on the Saxon command line specifies the parser that Saxon will use
814
      to process the source files. This class must implement the SAX2 <code>XMLReader</code>
815
      interface, but it is not required to be a real XML parser; it can take the input from any kind
816
      of source file, so long as it presents it in the form of a stream of SAX events. When using
817
      the JAXP API, the equivalent to the <code>-x</code> option is to call
818
        <code>transformerFactory.setAttribute( net.sf.saxon.lib.Feature.SOURCE_PARSER_CLASS.name,
819
        'com.example.package.Parser')</code></p>
820
  </section>
821
  <section id="XInclude" title="XInclude processing">
822
    <h1>XInclude processing</h1>
823

    
824

    
825
    <p>If you are using Xerces as your XML parser, you can have Xerces expand any XInclude
826
      directives.</p>
827

    
828
    <p>The <code>-xi</code> option on the command line causes XInclude processing to be applied to
829
      all input XML documents. This includes source documents, stylesheets, and schema documents
830
      listed on the command line, and also those loaded indirectly for example by calls on the
831
        <code>doc()</code> function or by mechanisms such as <code>xsl:include</code> and
832
        <code>xs:include</code>.</p>
833

    
834
    <p>From the Java API, the equivalent is to call <code>setXInclude()</code> on the
835
        <code>Configuration</code> object, or to set the attribute denoted by <a
836
        class="bodylink code" href="/configuration/config-features">Feature.XINCLUDE.name</a> to
837
        <code>Boolean.TRUE</code> on the <code>TransformerFactory</code>.</p>
838

    
839
    <p>XInclude processing can be requested at a per-document level by creating an <a
840
        class="javalink" href="net.sf.saxon.lib.AugmentedSource">AugmentedSource</a> and calling its
841
        <code>setXIncludeAware()</code> method. The corresponding method is also recognized on
842
      Saxon's implementation of the JAXP <code>DocumentBuilderFactory</code>. When the
843
        <code>doc()</code> or <code>document()</code> or <code>collection()</code> function is
844
      called from an XPath expression, XInclude processing can be enabled by including
845
        <code>xinclude=yes</code> among the query parameters in the URI.</p>
846
    
847
    <p>It is possible to request XInclude processing for the documents in a collection by including
848
    the query parameter <code>xinclude=yes</code> in the collection URI. Similarly, for a document
849
    read using the <code>doc()</code> or <code>document()</code> functions, XInclude processing can
850
      be requested using <code>xinclude=yes</code> in the document URI -- but only if the
851
    <code>StandardURIResolver</code> is used, and the feature is enabled by calling
852
      <code>Configuration.setParameterizedURIResolver()</code> or by setting <code>-p:on</code>
853
    on the <code>Query</code> or <code>Transform</code> command lines.</p>
854
    
855
    <p>The <a class="bodylink code" href="/xsl-elements/source-document">xsl:source-document</a>
856
      instruction can enable XInclude processing using
857
    the extension attribute <code>saxon:xinclude="yes"</code>.</p>
858

    
859
    <p>It is also possible to switch on XInclude processing (for all documents) by setting the
860
      system property:</p>
861
    <samp><![CDATA[-Dorg.apache.xerces.xni.parser.XMLParserConfiguration=
862
    org.apache.xerces.parsers.XIncludeParserConfiguration
863
]]></samp>
864

    
865
    <p>An alternative approach is to incorporate an XInclude processor as a SAX filter in the input
866
      pipeline. You can find a suitable SAX filter at <a href="http://xincluder.sourceforge.net/"
867
        class="bodylink">http://xincluder.sourceforge.net/</a>, and you can incorporate it into your
868
      application as described in <a class="bodylink" href="../input-filters">Writing Input
869
        Filters</a>.</p>
870

    
871
    <p>On the .NET platform, there is a customized <code>XmlReader</code> that performs XInclude
872
      processing available at <a href="http://mvpxml.codeplex.com" class="bodylink"
873
        >http://mvpxml.codeplex.com</a>. You can supply this as an argument to the method
874
        <code>Build(XmlReader parser)</code> in the <a class="javalink"
875
        href="Saxon.Api.DocumentBuilder">DocumentBuilder</a> class of the .NET Saxon API.</p>
876

    
877
    <p>For further information on using XInclude, see <a
878
        href="http://www.sagehill.net/docbookxsl/Xinclude.html" class="bodylink"
879
        >http://www.sagehill.net/docbookxsl/Xinclude.html</a>.</p>
880
  </section>
881
  <section id="controlling-parsing" title="Controlling Parsing of Source Documents">
882
    <h1>Controlling Parsing of Source Documents</h1>
883

    
884

    
885
    <p>Saxon does not include its own XML parser. By default:</p>
886

    
887
    <ul>
888
      <li>
889
        <p>On the Java platform, the default SAX parser provided as part of the JDK is used. With
890
          the Sun/Oracle JDK, this is a variant of the Apache Xerces parser customized by Sun.</p>
891
      </li>
892
      <li>
893
        <p>On the .NET platform, Saxon includes a copy of the Apache Xerces parser cross-compiled to
894
          run on .NET.</p>
895
      </li>
896
    </ul>
897

    
898
    <p>An error reported by the XML parser is generally fatal. It is not possible to process
899
      ill-formed XML.</p>
900

    
901
    <p>There are several ways you can cause a different XML parser to be used:</p>
902

    
903
    <ul>
904
      <li>
905
        <p>The <code>-x</code> and <code>-y</code> options on the command line can be used to
906
          specify the class name of a SAX parser, which Saxon will load in preference to the default
907
          SAX parser. The <code>-x</code> option is used for source XML documents, the
908
            <code>-y</code> option for schemas and stylesheets. The equivalent options can be set
909
          programmatically or by using the <a class="bodylink"
910
            href="/configuration/configuration-file">configuration file</a>.</p>
911
      </li>
912
      <li>
913
        <p>By default Saxon uses the <code>SAXParserFactory</code> mechanism to load a parser. This
914
          can be configured by setting the system property
915
            <code>javax.xml.parsers.SAXParserFactory</code>, by means of the file
916
            <code>lib/jaxp.properties</code> in the JRE directory, or by adding another parser to
917
          the <code>lib/endorsed</code> directory.</p>
918
      </li>
919
      <li>
920
        <p>The source for parsing can be supplied in the form of a <code>SAXSource</code> object,
921
          which has an <code>XMLReader</code> property containing the parser instance to be
922
          used.</p>
923
      </li>
924
      <li>
925
        <p>On .NET, the configuration option <code>PREFER_JAXP_PARSER</code> can be set to false, in
926
          which case Saxon will use the Microsoft XML parser instead of the Apache parser. (This
927
          parser is not used by default because it does not notify <code>ID</code> attributes to the
928
          application, which means the XPath <code>id()</code> and <code>idref()</code> functions do
929
          not work.)</p>
930
      </li>
931
    </ul>
932

    
933
    <p>Saxonica traditionally recommended use of the Xerces parser from Apache in preference to the version bundled
934
      in the JDK, which was known to have some serious bugs. However, there is some evidence that the version bundled
935
    in Java 8 is more reliable.</p>
936

    
937
    <p>By default, Saxon invokes the parser in non-validating mode (that is, without requested DTD
938
      validation). Note however, that the parser still needs to read the DTD if one is present,
939
      because it may contain entity definitions that need to be expanded. DTD validation can be
940
      requested using <code>-dtd:on</code> on the command line, or equivalent API or configuration
941
      options.</p>
942

    
943
    <p>Saxon is issued with local copies of commonly-used W3C DTDs such as the XHTML, SVG, and
944
      MathML DTDs. When Saxon itself instantiates the XML parser, it will use an
945
        <code>EntityResolver</code> that causes these local copies of DTDs to be used rather than
946
      fetching public copies from the web (the W3C servers are increasingly failing to serve these
947
      requests as the volume of traffic is too high). It is possible to override this using the
948
      configuration setting <code>ENTITY_RESOLVER_CLASS</code>, which can be set to the name of a
949
      user-supplied <code>EntityResolver</code>, or to the empty string to indicate that no
950
        <code>EntityResolver</code> should be used. Saxon will not add this
951
        <code>EntityResolver</code> in cases where the XML parser instance is supplied by the caller
952
      as part of a <code>SAXSource</code> object. It will add it to a parser obtained as an instance
953
      of the class specified using the <code>-x</code> and <code>-y</code> command line options,
954
      unless either the use of the <code>EntityResolver</code> is suppressed using the
955
        <code>ENTITY_RESOLVER_CLASS</code> configuration option, or the instantiated parser already
956
      has an <code>EntityResolver</code> registered.</p>
957

    
958
    <p>Saxon never asks the XML parser to perform schema validation. If schema validation is
959
      required it should be requested using the command line options <code>-val:strict</code> or
960
        <code>-val:lax</code>, or their API equivalents. Saxon will then use its own schema
961
      processor to validate the document as it emerges from the XML parser. Schema processing is
962
      done in parallel with parsing, by use of a SAX-like pipeline.</p>
963

    
964

    
965

    
966

    
967

    
968
  </section>
969
  <section id="xml11" title="Saxon and XML 1.1">
970
    <h1>Saxon and XML 1.1</h1>
971

    
972

    
973
    <p>XML 1.1 (with XML Namespaces 1.1) originally extended XML 1.0 in three ways:</p>
974
    <ul>
975
      <li>
976
        <p>the set of valid characters is increased</p>
977
      </li>
978
      <li>
979
        <p>the set of characters allowed in XML Names is increased</p>
980
      </li>
981
      <li>
982
        <p>namespace undeclarations are permitted</p>
983
      </li>
984
    </ul>
985

    
986
    <p>The second change has subsequently been retrofitted to XML 1.0 Fifth Edition (XML 1.0e5).
987
      Saxon now uses the XML 1.1 and XML 1.0e5 rules unconditionally for all validation of XML
988
      names.</p>
989

    
990
    <p>Saxon is capable of working with XML 1.1 input documents. If you want to use Saxon with XML
991
      1.1, you should set the option <code>-xmlversion:1.1</code> on the Saxon command line, or call
992
      the method <a class="javalink" href="net.sf.saxon.Configuration#setXMLVersion"
993
        >configuration.setXMLVersion(Configuration.XML11)</a> or, in the case of XSLT,
994
        <code>transformerFactory.setAttribute(FeaturesKeys.XML_VERSION, "1.1")</code>.</p>
995

    
996
    <p>This configuration setting affects:</p>
997
    <ul>
998
      <li>
999
        <p>the characters considered valid in the source of an XQuery query</p>
1000
      </li>
1001
      <li>
1002
        <p>the characters considered valid in the result of the functions
1003
            <code>codepoints-to-string()</code> and <code>unparsed-text()</code></p>
1004
      </li>
1005
      <li>
1006
        <p>the characters considered valid in the result of certain Saxon extension functions</p>
1007
      </li>
1008
      <li>
1009
        <p>the way in which line endings in XQuery queries are normalized</p>
1010
      </li>
1011
      <li>
1012
        <p>the default version used by the serializer (with output method XML)</p>
1013
      </li>
1014
    </ul>
1015

    
1016
    <p>Since Saxon 9.4, the configuration setting no longer affects:</p>
1017
    <ul>
1018
      <li>
1019
        <p>validation of names used in XQuery and XPath expressions, including names of elements,
1020
          attributes, functions, variables, and types</p>
1021
      </li>
1022
      <li>
1023
        <p>validation of names of constructed elements, attributes, and processing instructions in
1024
          XQuery and XSLT</p>
1025
      </li>
1026
      <li>
1027
        <p>schema validation of values of type <code>xs:NCName</code>, <code>xs:QName</code>,
1028
            <code>xs:NOTATION</code>, and <code>xs:ID</code></p>
1029
      </li>
1030
      <li>
1031
        <p>the permitted names of stylesheet objects such as keys, templates, decimal-formats,
1032
          output declarations, and output methods</p>
1033
      </li>
1034
    </ul>
1035

    
1036

    
1037
    <p>Note that if you use the default setting of "1.0", then supplying an XML 1.1 source document
1038
      as input may cause undefined errors.</p>
1039

    
1040
    <p>It is advisable to use an XML parser that supports XML 1.1 when the configuration is set to
1041
      "1.1", and an XML parser that does not support XML 1.1 when the configuration is set to "1.0".
1042
      However, Saxon does not enforce this.</p>
1043

    
1044
    <p>You can set the configuration to allow XML 1.1, but still serialize result documents as XML
1045
      1.0 by specifying the output property <code>version="1.0"</code>. In this case Saxon will
1046
      check while serializing the document that it conforms to the XML 1.0 constraints (note that
1047
      this check can be expensive). These checks are not performed if the configuration default is
1048
      set to XML 1.0.</p>
1049

    
1050
    <p>If you want the serializer to output namespace undeclarations, use the output property
1051
        <code>undeclare-namespaces="yes"</code> as well as <code>version="1.1"</code>.</p>
1052
  </section>
1053
  <section id="jaxpsources" title="JAXP Source Types">
1054
    <h1>JAXP Source Types</h1>
1055

    
1056

    
1057
    <p>
1058
      <i>This section is relevant to the Java platform only.</i>
1059
    </p>
1060

    
1061
    <p>When a user application invokes Saxon via the Java API, then a source document is supplied as
1062
      an instance of the JAXP <code>Source</code> class. This is true whether invoking an XSLT
1063
      transformation, an XQuery query, or a free-standing XPath expression. The <code>Source</code>
1064
      class is essentially a marker interface. The <code>Source</code> that is supplied must be a
1065
      kind of <code>Source</code> that Saxon recognizes.</p>
1066

    
1067
    <p>Saxon recognizes all three kinds of <code>Source</code> defined in JAXP: a
1068
        <code>StreamSource</code>, a <code>SAXSource</code>, and a <code>DOMSource</code>. </p>
1069
    
1070
    <ul>
1071
      <li>
1072
        <p>When using a <code>StreamSource</code>, note:</p>
1073
        <ul>
1074
          <li>A <code>StreamSource</code> that wraps an <code>InputStream</code> or <code>Reader</code>
1075
            can only be used once: it is consumed by use. However, a <code>StreamSource</code> that wraps
1076
          a <code>File</code> or URI can be used multiple times.</li>
1077
          <li>Whoever creates an <code>InputStream</code> or <code>Reader</code> is responsible for closing
1078
          it after use. This means that if Saxon creates an <code>InputStream</code> from a supplied <code>File</code>
1079
            or URI, it will close that <code>InputStream</code> after use; but if the <code>InputStream</code> is created
1080
          by the calling application, then the calling application is responsible for closing it. (On some operating systems
1081
          it is important not to leave unclosed streams lying around.)</li>
1082
          <li>If the <code>StreamSource</code> wraps an <code>InputStream</code> or <code>Reader</code>, then the base URI
1083
          of the document is taken from the <code>SystemID</code> property of the <code>StreamSource</code>. If this is not set,
1084
          then the base URI is unknown, which may cause constructs that require a known base URI to fail.</li>
1085
        </ul>
1086
        <aside>There are cases where it is difficult for the application to take responsibility for closing a stream after it has been read to completion.
1087
        For example, if a <code>URIResolver</code> returns a <code>StreamSource</code>, there is no callback from Saxon
1088
        to the application at the time the stream has been exhausted. Saxon therefore allows the <code>StreamSource</code>
1089
        to be wrapped in an <code>AugmentedSource</code>, whose <code>setPleaseCloseAfterUse()</code> method can be used
1090
        to request that Saxon closes the stream.</aside>
1091
      
1092
      </li>
1093
      <li>
1094
        <p>When using a <code>SAXSource</code>, note:</p>
1095
        <ul>
1096
          <li>If no <code>XMLReader</code> is supplied, Saxon will allocate one, based on settings in the <code>Configuration</code>.</li>
1097
          <li>Processing of the contained <code>InputSource</code> is entirely the responsibility of the XML parser; Saxon is not involved
1098
          in this.</li>
1099
          <li>Saxon will modify properties of the supplied <code>XMLReader</code>: it will set the <code>ContentHandler</code>
1100
          and <code>LexicalHandler</code> so that it can receive the output of parsing, and it will set the <code>ErrorHandler</code>
1101
          so it can handle parsing errors.</li>
1102
          <li>Saxon makes no attempt to ensure that processing of a <code>SAXSource</code> or its underlying <code>XMLReader</code>
1103
          is thread-safe. The same <code>XMLReader</code> should not be used concurrently in multiple threads.</li>
1104
        </ul>
1105
        
1106
      </li>
1107
      <li>
1108
        <p>When using a <code>DOMSource</code>, note:</p>
1109
        <ul>
1110
          <li>The DOM is not thread-safe, even when used in read-only mode. Saxon therefore synchronizes all its access to DOM methods.
1111
          However, that's no protection if there are application threads accessing the DOM that aren't using Saxon.</li>
1112
          <li>The base URI
1113
            of the document is taken from the <code>SystemID</code> property of the <code>DOMSource</code>. If this is not set,
1114
            then the base URI is unknown, which may cause constructs that require a known base URI to fail.</li>
1115
          <li>From Saxon 9.8, Saxon-EE uses a new mechanism for processing DOM trees, called the Domino model. This involves creating
1116
          an index of all the nodes in the DOM, providing for faster navigation. Saxon-PE and Saxon-HE continue to use the DOM <code>NodeWrapper</code>
1117
          model, where DOM methods are used to navigate the tree. A transformation using the Domino model takes typically twice as long as Saxon's native <code>TinyTree</code>,
1118
          while the <code>NodeWrapper</code> model can take 5 to 10 times as long. An alternative approach is to convert the DOM tree to a <code>TinyTree</code> before the
1119
          transformation starts. Even better: don't use DOM in the first place.</li>
1120
        </ul>
1121
      </li>
1122
    </ul>
1123
        
1124
        <p>Other kinds of <code>Source</code> that are recognized by most Saxon interfaces are:</p>
1125
        
1126
        <ul>
1127
          <li><code>TreeInfo</code>: Saxon's <code>TreeInfo</code> holds information about a document (or more generally any tree of nodes), 
1128
            and can be used directly as a <code>Source</code> of a transformation.</li>
1129
          <li><code>NodeInfo</code>: Saxon's <code>NodeInfo</code> represents a node in a tree, 
1130
            and can be used directly as a <code>Source</code> of a transformation.</li>
1131
          <li><code>StaxSource</code>: allows a pull parser to be used.</li>
1132
          <li><code>PullSource</code>: Saxon's internal pull interface.</li>
1133
          <li><code>EventSource</code>: Similar to an <code>XMLReader</code>,but with a much simpler interface, an <code>EventSource</code>
1134
          has a <code>send()</code> method that sends a stream of events to a Saxon <code>Receiver</code>.</li>
1135
          <li><code>SaplingDocument</code>: a sapling tree constructed using the sapling construction interface can be used anywhere
1136
          (within Saxon) that a <code>Source</code> is expected.</li>
1137
        </ul>
1138
      
1139
    
1140

    
1141
    <p>Saxon also accepts input from an <code>XMLStreamReader</code>
1142
        (<code>javax.xml.stream.XMLStreamReader</code>), that is a StAX pull parser as defined in
1143
      JSR 173. This is achieved by creating an instance of <a class="javalink"
1144
        href="net.sf.saxon.pull.StaxBridge">net.sf.saxon.pull.StaxBridge</a>, supplying the
1145
        <code>XMLStreamReader</code> using the <code>setXMLStreamReader()</code> method, and
1146
      wrapping the <code>StaxBridge</code> object in an instance of <a class="javalink"
1147
        href="net.sf.saxon.pull.PullSource">net.sf.saxon.pull.PullSource</a>, which implements the
1148
      JAXP <code>Source</code> interface and can be used in any Saxon method that expects a
1149
        <code>Source</code>. Saxon has been validated with two StAX parsers: the Zephyr parser from
1150
      Sun (which is supplied as standard with JDK 1.6), and the open-source Woodstox parser from
1151
      Tatu Saloranta. In Saxonica's experience, Woodstox is the more reliable of the two. However, there is
1152
      no immediate benefit in using a pull parser to supply Saxon input rather than a push parser;
1153
      the main use case for using an <code>XMLStreamReader</code> is when the data is supplied from
1154
      some source other than parsing of lexical XML.</p>
1155

    
1156
    <p>Nodes in Saxon's implementation of the XPath data model are represented by the interface <a
1157
        class="javalink" href="net.sf.saxon.om.NodeInfo">NodeInfo</a>. A <code>NodeInfo</code> is
1158
      itself a <code>Source</code>, which means that any method in the API that requires a source
1159
      object will accept any implementation of <code>NodeInfo</code>. As discussed in the next
1160
      section, implementations of <code>NodeInfo</code> are available to wrap Axiom, DOM, DOM4J,
1161
      JDOM2, or XOM nodes, and in all cases these wrapper objects can be used wherever a
1162
        <code>Source</code> is required.</p>
1163

    
1164
    <p>Saxon also provides a class <a class="javalink" href="net.sf.saxon.lib.AugmentedSource"
1165
        >net.sf.saxon.lib.AugmentedSource</a> which implements the <code>Source</code> interface.
1166
      This class encapsulates one of the standard <code>Source</code> objects, and allows additional
1167
      processing options to be specified. These options include whitespace handling, schema and DTD
1168
      validation, XInclude processing, error handling, choice of XML parser, and choice of Saxon
1169
      tree model.</p>
1170

    
1171
    <p>Saxon allows additional <code>Source</code> types to be supported by registering a <a
1172
        class="javalink" href="net.sf.saxon.lib.SourceResolver">SourceResolver</a> with the <a
1173
        class="javalink" href="net.sf.saxon.Configuration">Configuration</a> object. The task of a
1174
        <code>SourceResolver</code> is to convert a <code>Source</code> that Saxon does not
1175
      recognize into a <code>Source</code> that it does recognize. For example, this may be done by
1176
      building the document tree in memory and returning the <a class="javalink"
1177
        href="net.sf.saxon.om.NodeInfo">NodeInfo</a> object representing the root of the tree.</p>
1178
  </section>
1179
  <section id="thirdparty"
1180
    title="Third-party Object Models: Axiom, DOM, JDOM2, XOM, and DOM4J">
1181
    <h1>Third-party Object Models: Axiom, DOM, JDOM2, XOM, and DOM4J</h1>
1182

    
1183

    
1184
    <p>
1185
      <i>This section is relevant to the Java platform only.</i>
1186
    </p>
1187

    
1188
    <p>In the case of DOM, all Saxon editions support DOM access "out of the box", and no special
1189
      configuration action is necessary. See also <a class="bodylink" href="/sourcedocs/domino">The Domino Tree Model</a>.</p>
1190

    
1191
    <p>Support for Axiom, JDOM2, XOM, and DOM4J is not available "out of the box" with
1192
      Saxon-HE, but the source code is open source (in sub-packages of
1193
        <code>net.sf.saxon.option</code>) and can be compiled for use with Saxon-HE if required.</p>
1194

    
1195
    <aside>In general, use of a third party tree implementation is much less efficient than using
1196
      Saxon's native <code>TinyTree</code>. These models should only be used if your application
1197
      needs to construct them for other reasons. Transforming a DOM can take up to 10 times longer
1198
      than transforming the equivalent <code>TinyTree</code>.</aside>
1199

    
1200

    
1201
    <p>The support code for Axiom, DOM4J, JDOM2, and XOM is integrated into the main JAR files
1202
      for Saxon-PE and Saxon-EE, but (unlike the case of DOM) it is not activated unless the object
1203
      model is registered with the <a class="javalink" href="net.sf.saxon.Configuration"
1204
        >Configuration</a>. To activate support for one of these models, the implementation must either be included 
1205
      in the relevant section of the
1206
      configuration file, or it must be nominated to the configuration using the method <a class="javalink"
1207
        href="net.sf.saxon.Configuration#registerExternalObjectModel"
1208
        >registerExternalObjectModel()</a>. </p>
1209
    
1210
    <aside>Support for JDOM version 1 is dropped with effect from Saxon 10.0. Applications should migrate
1211
    to JDOM2.</aside>
1212

    
1213
    <p>Each supported object model is represented in Saxon by a <a class="javalink"
1214
        href="net.sf.saxon.om.TreeModel">TreeModel</a> object, which in the case of external object
1215
      models will also be an instance of <a class="javalink"
1216
        href="net.sf.saxon.lib.ExternalObjectModel">ExternalObjectModel</a>. The
1217
        <code>TreeModel</code> can be used to get a <code>Builder</code>, which can then be used to
1218
      construct an instance of the model from SAX input. The <code>Builder</code> can also be
1219
      inserted into a pipeline to capture the output of a transformation or query.</p>
1220

    
1221
    <p>For DOM input, the source can be supplied by wrapping a <code>DOMSource</code> around the DOM
1222
      Document node. For Axiom, JDOM2, XOM, and DOM4J the approach is similar, except that the
1223
      wrapper classes are supplied by Saxon itself: they are <a class="javalink"
1224
        href="net.sf.saxon.option.axiom.AxiomDocument"
1225
        >net.sf.saxon.option.axiom.AxiomDocument</a>,  <a class="javalink"
1226
        href="net.sf.saxon.option.jdom2.JDOM2DocumentWrapper"
1227
        >net.sf.saxon.option.jdom2.JDOM2DocumentWrapper</a>, <a class="javalink"
1228
        href="net.sf.saxon.option.xom.XOMDocumentWrapper"
1229
        >net.sf.saxon.option.xom.XOMDocumentWrapper</a>, and <a class="javalink"
1230
        href="net.sf.saxon.option.dom4j.DOM4JDocumentWrapper"
1231
        >net.sf.saxon.option.dom4j.DOM4JDocumentWrapper</a> respectively. These wrapper classes
1232
      implement the Saxon <a class="javalink" href="net.sf.saxon.om.NodeInfo">NodeInfo</a> interface
1233
      (which means that they also implement <code>Source</code>).</p>
1234

    
1235

    
1236
    <aside>Note that the Xerces DOM implementation is not thread-safe, even for read-only access.
1237
      Saxon's wrapper classes for the DOM therefore synchronize all access to the DOM. This provides
1238
      thread-safety, but only if the application takes care to avoid creating more than one wrapper
1239
      for the same DOM Document.</aside>
1240

    
1241
    <p>Saxon supports these models by wrapping each external node in a wrapper that implements the
1242
      Saxon <code>NodeInfo</code> interface. When nodes are returned by the XQuery or XPath API,
1243
      these wrappers are removed and the original node is returned. Similarly, the wrappers are
1244
      generally removed when extension functions expecting a node are called.</p>
1245

    
1246
    <p>Saxon does not support wrapping of an external tree that contains entity reference nodes.
1247
      Most parsers provide an option to avoid constructing a tree that contains such nodes. For
1248
      example, with the JDK Xerces DOM parser, use <code>DOMParser dp = new DOMParser();
1249
        dp.setFeature("http://apache.org/xml/features/dom/create-entity-ref-nodes",
1250
        expandEntities);</code>. If there is a need to process a tree that does contain entity
1251
      references, it should be copied to a Saxon tree. (Note, this only affects entities explicitly
1252
      declared in a DTD. It does not affect character references or built-in entity references such
1253
      as <code>&amp;lt;</code>, which never appear as entity reference nodes in the tree.)</p>
1254

    
1255
    <p>In the case of DOM only, Saxon also supports a wrapping the other way around: an object
1256
      implementing the DOM interface may be wrapped around a Saxon <code>NodeInfo</code>. This is
1257
      done when Java methods expecting a DOM <code>Node</code> are called as extension functions, if
1258
      the <code>NodeInfo</code> is not itself a wrapper for a DOM <code>Node</code>.</p>
1259

    
1260
    <p>You can also send output to a DOM by using a <code>DOMResult</code>, or to a JDOM2 tree by
1261
      using a <code>JDOM2Result</code>, or to a XOM document by using a <code>XOMWriter</code>. In
1262
      such cases it is a good idea to set <code>saxon:require-well-formed="yes"</code> on
1263
        <code>xsl:output</code> to ensure that the transformation or query result is a well-formed
1264
      document (for example, that it does not contain several elements at the top level).</p>
1265

    
1266
    <p>External object models do not in all cases fully support the XDM (XPath data model). In
1267
      particular, many of them have restrictions concerning the recognition of <code>ID</code> and
1268
        <code>IDREF</code> attributes. In most cases they do not allow "namespace undeclarations" (so
1269
      a prefix that is in-scope for a parent element will always be in-scope for its child elements).
1270
      None of the external object models support typed
1271
      (schema-validated) data, and none support in-situ update using XQuery updates.</p>
1272
  </section>
1273
  <section id="choosingmodel" title="Choosing a Tree Model">
1274
    <h1>Choosing a Tree Model</h1>
1275

    
1276

    
1277
    <p>Saxon provides several implementations of the internal tree data structure (or tree model).
1278
      The tree model can be chosen by an option on the command line (<code>-tree:tiny</code> for the
1279
      tiny tree, <code>-tree:linked</code> for the linked tree). There is also a variant of the tiny
1280
      tree called a "condensed tiny tree" which saves space (at the expense of build time) by
1281
      recognizing text nodes and attribute nodes whose values appear more than once in the input
1282
      document. The tree model can also be selected from the Java API. The default is to use the
1283
      tiny tree model. The choice should make no difference to the results of a transformation
1284
      (except the order of attributes and namespace declarations) but only affects performance.</p>
1285

    
1286
    <p>
1287
      <i>The "linked tree" is the only model to support in-situ updates, so if you are using XQuery
1288
        Update you must choose this model.</i>
1289
    </p>
1290

    
1291
    <p>Generally speaking, the tiny tree model is both faster to build and faster to navigate. It
1292
      also uses less space.</p>
1293

    
1294
    <p>The tiny tree model gives most benefit when you are processing a large document. It uses a
1295
      lot less memory, so it can prevent thrashing when the size of document is such that the linked
1296
      tree doesn't fit in real memory. Use the "condensed" variant if you need to save memory, and
1297
      if your source data contains many text or attribute nodes with repeated values.</p>
1298
    
1299
    <p>Saxon also offers the option <code>-tree:condensed</code>. This delivers a TinyTree with
1300
    additional compression. Specifically, when a document contains multiple text nodes or
1301
    attribute nodes with the same string value, the condensed tree will "common up" the storage
1302
    for these nodes. This option gives a further reduction in memory usage, at the cost of slower
1303
    tree construction.</p>
1304

    
1305
    <p>The linked tree is used internally to represent stylesheet and schema modules because of the
1306
      programming convenience it offers: it allows element nodes on the tree to be represented by
1307
      custom classes for each kind of element. The linked tree is also needed when you want to use
1308
      XQuery Update, because unlike the tiny tree, it is mutable.</p>
1309

    
1310
    <p>
1311
      <i>If in doubt, stick with the default.</i>
1312
    </p>
1313
  </section>
1314
  <section id="domino" title="The Domino Tree Model">
1315
    <h1>The Domino Tree Model</h1>
1316
    <p>The Domino tree model was introduced in Saxon 9.8 and is available in Saxon-EE only. It is a new approach
1317
    to the handling of DOM source trees.</p>
1318
    <p>The Domino data structure is essentially a combination of the DOM and parts of the TinyTree. It takes the
1319
    unchanged DOM tree, and indexes it with vectors containing information (for each DOM node) about the node kind,
1320
    node name, and level in the document. These vectors are exactly the same as those used in the TinyTree; the difference
1321
    is that there is no text content, or attributes; these are replaced by references to the DOM nodes. 
1322
    All navigation around the tree is done purely using the index vectors,
1323
    while retrieval of the string value of text and attribute nodes is done by reference to the DOM structure. The effect
1324
    is that navigation is almost as fast as using the TinyTree, but queries are still able to return the original DOM Nodes.</p>
1325
    <p>Overall, queries and transformations using the Domino model take about double the time of the same query using the
1326
    TinyTree, compared with 5 to 10 times longer using the DOM Wrapper model. There is an initial overhead in building
1327
    the indexes, but this is incurred once only.</p>
1328
    <p>The Domino model must not be used with a DOM tree that is subject to update, other than changes to the values of
1329
    attribute or text nodes, which might work (but are still best avoided). Saxon has no way of preventing or detecting
1330
    updates, so these will generally cause catastrophic failure.</p>
1331
    
1332
  </section>
1333
  <section id="ptree" title="The PTree File Format">
1334
    <h1>The PTree File Format</h1>
1335

    
1336
    <p>The PTree (persistent tree) was a binary XML serialization supported by earlier Saxon
1337
    releases. It has been dropped from the product with effect from Saxon 10.0. Third-party
1338
    offerings such as EXI do the same job better.</p>
1339
 
1340
  </section>
1341
  <section id="validation" title="Validation of Source Documents">
1342
    <h1>Validation of Source Documents</h1>
1343

    
1344

    
1345
    <p>With Saxon-EE, source documents may be validated against a schema. Not only does this perform
1346
      a check that the document is valid, it also adds type information to each element and
1347
      attribute node in the document to identify the schema type against which it was validated. It
1348
      may also expand the source document by adding default values of elements and attributes.</p>
1349

    
1350
    <p>If the option <code>-val:strict</code> is specified on the command line for
1351
        <code>com.saxonica.Query</code> or <code>com.saxonica.Transform</code>, then the principal
1352
      source document to the query or transformation is schema-validated, as is every document
1353
      loaded using the <code>doc()</code> or <code>document()</code> function. Saxon will look among
1354
      all the loaded schemas for an element declaration that matches the outermost element of the
1355
      document, and will then check that the document is valid against that element declaration,
1356
      reporting a fatal error if it is not. The loaded schemas include schemas imported statically
1357
      into the query or stylesheet using <code>import schema</code> or
1358
        <code>xsl:import-schema</code>, schemas referenced in the <code>xsi:schemaLocation</code> or
1359
        <code>xsi:noNamespaceSchemaLocation</code> attributes of the source document itself, and
1360
      schemas loaded by the application using the <code>addSchema</code> method of the <a
1361
        class="javalink" href="net.sf.saxon.Configuration">Configuration</a> object.</p>
1362

    
1363
    <p>As an alternative to <code>-val:strict</code>, the option <code>-val:lax</code> may be
1364
      specified. This validates the document if and only if an element declaration can be found. If
1365
      there is no declaration of the outermost element in any loaded schema, then it is left as an
1366
      untyped document.</p>
1367

    
1368
    <p>When invoking transformations or queries from the Java API, the equivalent of the
1369
        <code>-val:strict</code> option is to call the method
1370
        <code>setSchemaValidation(Validation.STRICT)</code> on the <code>Configuration</code>
1371
      object. The equivalent of <code>-val:lax</code> is
1372
        <code>setSchemaValidation(Validation.LAX)</code>.</p>
1373

    
1374
    <p>When documents are built using the <a class="javalink"
1375
        href="net.sf.saxon.s9api.DocumentBuilder">DocumentBuilder</a> in the s9api interface, or the
1376
        <a class="javalink" href="Saxon.Api.DocumentBuilder">DocumentBuilder</a> in the Saxon.Api
1377
      interface on .NET, validation may be controlled by setting the appropriate options on the
1378
        <code>DocumentBuilder</code>.</p>
1379

    
1380
    <p>On Java interfaces that expect a JAXP <code>Source</code> object it is possible to request
1381
      validation by supplying an <a class="javalink" href="net.sf.saxon.lib.AugmentedSource"
1382
        >AugmentedSource</a>. This consists of a <code>Source</code> and a set of options, including
1383
      validation options; since <code>AugmentedSource</code> implements the JAXP <code>Source</code>
1384
      interface it is possible to use it anywhere that a <code>Source</code> is expected, including
1385
      as the object returned by a user-written <code>URIResolver</code>.</p>
1386

    
1387
    <p>Saxon's standard <code>URIResolver</code> uses this technique if it has been enabled (for
1388
      example by using <code>-p</code> on the command line). With this option, any URI containing
1389
      the query parameter <code>?val=strict</code> (for example,
1390
        <code>doc('source.xml?val=strict')</code>) causes strict validation to be requested for that
1391
      document, while <code>?val=lax</code> requests lax validation, and <code>?val=strip</code>
1392
      requests no validation.</p>
1393
    
1394
    <p>XSLT 3.0 provides a standard way of requesting validation for individual source documents,
1395
      using the <code>validation</code> and <code>type</code> attributes of the <a class="bodylink
1396
        code" href="/xsl-elements/source-document">xsl:source-document</a> instruction.</p>
1397
    
1398
  </section>
1399
  <section id="whitespace" title="Whitespace Stripping in Source Documents">
1400
    <h1>Whitespace Stripping in Source Documents</h1>
1401

    
1402

    
1403
    <p>A number of factors combine to determine whether whitespace-only text nodes in the source
1404
      document are visible to the user-written XSLT or XQuery code.</p>
1405

    
1406
    <p>By default, if there is a DTD or schema, then <i>ignorable whitespace</i> is stripped from
1407
      any source document loaded from a <code>StreamSource</code> or <code>SAXSource</code>.
1408
      Ignorable whitespace is defined as the whitespace that appears separating the child elements
1409
      in elements declared to have element-only content. This whitespace is removed regardless of
1410
      any <code>xml:space</code> attributes in the source document.</p>
1411

    
1412
    <p>It is possible to change this default behavior in several ways.</p>
1413
    <ul>
1414
      <li>
1415
        <p>From the <code>com.saxonica.Query</code> or <code>com.saxonica.Transform</code> command
1416
          line, options are available: <code>-strip:all</code> strips all whitespace text nodes,
1417
            <code>-strip:none</code> strips no whitespace text nodes, and
1418
            <code>-strip:ignorable</code> strips ignorable whitespace text nodes only (this is the
1419
          default).</p>
1420
      </li>
1421
      <li>
1422
        <p>If the <code>-p</code> option is used on the command line, then query parameters are
1423
          recognized in the URI passed to the <code>document()</code> or <code>doc()</code>
1424
          function. The parameter <code>strip-space=yes</code> strips all whitespace text nodes,
1425
            <code>strip-space=no</code> strips no whitespace text nodes, and
1426
            <code>strip-space=ignorable</code> strips ignorable whitespace text nodes only. This
1427
          overrides anything specified on the command line.</p>
1428
      </li>
1429
      <li>
1430
        <p>Options corresponding to the above can also be set on the <code>TransformerFactory</code>
1431
          object or on the <a class="javalink" href="net.sf.saxon.Configuration">Configuration</a>.
1432
          These settings are global.</p>
1433
      </li>
1434
    </ul>
1435

    
1436
    <p>Whitespace stripping that is specified in any of the above ways does not occur only if the
1437
      source document is parsed under Saxon's control: that is, if it is supplied as a JAXP
1438
        <code>StreamSource</code> or <code>SAXSource</code>. It also applies where the input is
1439
      supplied in the form of a tree (for example, a DOM). In this case Saxon wraps the supplied
1440
      tree in a virtual tree that provides a view of the original tree with whitespace text nodes
1441
      omitted.</p>
1442

    
1443
    <p>This whitespace stripping is additional (and prior) to any stripping carried out as a result
1444
      of the <code>xsl:strip-space</code> declaration in the stylesheet.</p>
1445
    
1446
    <p>Saxon never modifies a supplied tree <i>in situ</i>: if a tree is supplied as input, and the stylesheet
1447
      requests space stripping, then a virtual tree is created and whitespace is stripped on the fly as
1448
      it is navigated. This is expensive (it can add 25% to processing time); it is therefore best to
1449
      supply a <code>SAXSource</code> or <code>StreamSource</code> as input to a transformation, so
1450
      that Saxon can strip unwanted whitespace while the tree is being parsed and built.
1451
    </p>
1452
  </section>
1453
  <section id="streaming" title="Streaming of Large Documents">
1454
    <h1>Streaming of Large Documents</h1>
1455

    
1456
    <aside>Streaming is available only in Saxon-EE.</aside>
1457

    
1458
    <p>Sometimes source documents are too large to hold in memory. Saxon-EE provides a range of
1459
      facilities for processing such documents in <i>streaming mode</i>: that is, processing data as
1460
      it is read by the XML parser, without building a complete tree representation of the document
1461
      in memory.</p>
1462

    
1463
    <p>These facilities are closely aligned with the XSLT 3.0 Recommendation. Some facilities
1464
      are specific to Saxon, and a few facilities are also available in XQuery.</p>
1465

    
1466
    <p>Inevitably there are things that cannot be done in streaming mode - sorting is an obvious
1467
      example. Sometimes, achieving a streaming transformation means rethinking the design of how it
1468
      works - for example, splitting it into multiple phases. So streaming is rarely a case of
1469
      simply taking your existing code and setting a simple switch to request streamed
1470
      implementation.</p>
1471

    
1472
    <p>For more information, see the following sections:</p>
1473

    
1474
    <nav>
1475
      <ul/>
1476
    </nav>
1477

    
1478
    <section id="xslt-streaming" title="Streaming using XSLT 3.0">
1479
      <h1>Streaming using XSLT 3.0</h1>
1480

    
1481
      <aside>Requires Saxon-EE.</aside>
1482

    
1483
      <p>Saxon-EE (from Saxon 9.8) is fully conformant to the final XSLT 3.0 recommendation in terms of the
1484
        streaming facilities it supports. A few gaps in coverage that were found after release were fixed for Saxon 9.9. 
1485
        There are also some extensions.</p>
1486

    
1487
      <p>There are two main ways to initiate a streaming transformation:</p>
1488

    
1489
      <ol>
1490
        <li><p>Using the <a class="bodylink code" href="/xsl-elements/source-document">xsl:source-document</a>
1491
          instruction, with the attribute <code>streamable="yes"</code>. 
1492
          Here the source document is identified within the stylesheet itself.
1493
          Typically such a stylesheet will have a named template as its entry point, and will not
1494
          have any principal source document supplied externally.</p></li>
1495
        <li><p>By supplying a source document as input to a stylesheet whose initial mode is declared
1496
          with <code>streamable="yes"</code> in an <a href="/xsl-elements/mode"
1497
            class="bodylink code">xsl:mode</a> declaration. In this case the source document must be
1498
          supplied as a <code>StreamSource</code> or <code>SAXSource</code>, and not as an in-memory
1499
          tree. The details depend on which API is being used:</p>
1500
          <ul>
1501
            <li><p>With the Java s9api API, compile the stylesheet to create an <code>XsltExecutable</code>,
1502
            and then use the <code>load30</code> method to create an <code>Xslt30Transformer</code>.
1503
            Invoke the streamed transformation using the <code>applyTemplates</code> method of
1504
            the <code>Xslt30Transformer</code>, supplying the input as a <code>StreamSource</code>
1505
            or <code>SAXSource</code>.</p></li> 
1506
            <li><p>Similarly with the Saxon.Api interface on .NET, use the method
1507
            <code>Xslt30Transformer.ApplyTemplates()</code>, supplying a <code>Stream</code> 
1508
            as input.</p></li>
1509
            <li><p>With the JAXP API, start by instantiating a <code>com.saxonica.config.StreamingTransformerFactory</code>.
1510
            Invoke the transformation in the usual way by creating a <code>Transformer</code> (optionally via a
1511
            <code>Templates</code> object). When the <code>transform()</code> method is called with a
1512
            <code>StreamSource</code> or <code>SAXSource</code> as input, and when the initial mode
1513
            is a streamable mode, the input will be streamed. In consequence, this approach breaks the
1514
            normal JAXP convention whereby the document supplied as the <code>Source</code> argument to
1515
            the <code>transform()</code> method also becomes the global context item (the value of "." when
1516
            accessed within the initializer of a global variable). Instead such a reference fails with 
1517
            an XPDY0002 dynamic error.</p>
1518
            <p>The <code>StreamingTransformerFactory</code> can also be used to create an <code>XMLFilter</code>
1519
            which takes streamed input and produces streamed output, and a pipeline can be built from a
1520
            sequence of such filters connected end-to-end in the usual JAXP way.</p></li>
1521
          </ul>
1522
        </li>
1523
      </ol>
1524

    
1525
      <p>The <a class="bodylink code" href="/functions/saxon/stream">saxon:stream</a> extension
1526
        function used in previous releases is still supported for the time being. In Saxon 9.8 and later a
1527
        call on <code>saxon:stream</code> is translated at compile time into a call on the XSLT 3.0
1528
          <code>&lt;xsl:source-document&gt;</code> instruction. The original Saxon mechanism for streaming,
1529
        namely the <code>saxon:read-once</code> attribute on <code>xsl:copy-of</code>, was dropped
1530
        in Saxon 9.6.</p>
1531

    
1532
      <p>The rules for whether a construct is streamable or not are largely the same in Saxon as in
1533
        the XSLT 3.0 specification. Saxon applies these rules after doing any optimization
1534
        re-writes, so some constructs end up being streamable in Saxon even though they are not
1535
        guaranteed streamable in the W3C spec, because the Saxon optimizer rewrites the expression
1536
        into a streamable form. An example of this effect is where variables or functions are
1537
        inlined before doing the streamability analysis. In contrast, when streaming is requested,
1538
        the optimizer takes care to avoid rewriting streamable constructs into a non-streamable
1539
        form.</p>
1540

    
1541
      <p>This documentation does not attempt to provide a tutorial introduction to the streaming
1542
        capabilities of XSLT 3.0. The specification itself is not easy to read, especially the
1543
        detailed rules on which constructs are deemed streamable. However, for the most part it is
1544
        not necessary to be familiar with the detailed rules. The main things to remember are:</p>
1545

    
1546
      <ul>
1547
        <li>A construct is "consuming" if it reads a subtree of the source document, that is, if it
1548
          makes a downwards selection from the context item. In general, constructs are not allowed
1549
          to have two operands that are both consuming. Some exceptions to this are: the <a
1550
            class="bodylink code" href="/xsl-elements/fork">xsl:fork</a> instruction; conditional
1551
          expressions such as <a class="bodylink code" href="/xsl-elements/choose">xsl:choose</a> if
1552
          each branch only contains one consuming expression; the map expression
1553
            <code>map{...}</code> in XPath and the <a class="bodylink code" href="/xsl-elements/map"
1554
            >xsl:map</a> instruction in XSLT.</li>
1555
        <li>During a streaming pass, the XSLT processor remembers the ancestors of the context item
1556
          and all the attributes of ancestors. Path expressions that access the ancestors and their
1557
          attributes are therefore allowed. However, such expressions should generally return atomic
1558
          values (for example the values of attributes) rather than returning nodes in the streamed
1559
          document, because if nodes are returned, the system often can't be sure that there is no
1560
          disallowed navigation from those nodes (for example, you can't get all the descendants of
1561
          an ancestor node).</li>
1562
        <li>It's not permitted to bind a streamed node to a variable or parameter, or to pass it to
1563
          a function.</li>
1564
        <li>An expression such as <code>//section</code> is referred to as a crawling expression.
1565
          Crawling expressions potentially contain nodes which overlap each other, which creates
1566
          problems if you want to make further downward selections from such nodes. The XSLT 3.0
1567
          specification allows this in some circumstances, for example you can pass such an
1568
          expression to a function that atomizes the result, but other cases (for example, using
1569
          such an expression in <a class="bodylink code" href="/xsl-elements/for-each"
1570
            >xsl:for-each</a> or <a class="bodylink code" href="/xsl-elements/apply-templates"
1571
            >xsl:apply-templates</a>) are forbidden. If you know that the expression will never
1572
          select overlapping nodes (for example, if you know that <code>//title</code> will never
1573
          select one title appearing within another title), then you can rewrite the expression as
1574
            <code>outermost(//title)</code> to avoid the restrictions. Saxon also allows overlapping
1575
          nodes in some contexts where the W3C specification does not, provided streamability
1576
          extensions are enabled.</li>
1577
        <li>When you hit these restrictions, you can often work around them by making a copy of a
1578
          subtree of the streamed document, for example by using the new <a class="bodylink code"
1579
            href="/functions/fn/copy-of">copy-of()</a> or <a class="bodylink code"
1580
            href="/functions/fn/snapshot">snapshot()</a> functions. These are consuming expressions,
1581
          but the result is "grounded" (that is, an ordinary in-memory tree) so it can be used
1582
          without any restrictions. Clearly this only works if the subtrees that you copy are small
1583
          enough to fit in memory.</li>
1584
      </ul>
1585

    
1586
      <p>The XSLT 3.0 constructs most relevant to streaming are:</p>
1587

    
1588
      <ul>
1589
        <li><strong>Streamable template rules</strong>. XSLT 3.0 has a new <a class="bodylink code"
1590
            href="/xsl-elements/mode">xsl:mode</a> declaration, and this allows all the template
1591
          rules in a particular mode to be declared streamable (<code>&lt;xsl:mode
1592
            streamable="yes"/&gt;</code>). If a mode is declared streamable, then Saxon checks
1593
          whether all the template rules in that mode are actually streamable, and reports a
1594
          compile-time error if not.</li>
1595
        <li>The <a class="bodylink code"
1596
          href="/xsl-elements/source-document">xsl:source-document</a> instruction.
1597
          This has an <code>href</code> attribute which defines the URI of a streamed input
1598
          document, and the instructions within <code>xsl:source-document</code> are evaluated with this
1599
          document as the context node. When streamed processing is requested using the attribute
1600
          <code>streamable="yes"</code>, the body of the <code>xsl:source-document</code> instruction must
1601
          satisfy the streamability rules; again, any violation is detected at compile time.</li>
1602
        <li>The <a class="bodylink code" href="/xsl-elements/iterate">xsl:iterate</a> instruction.
1603
          This is like an <a class="bodylink code" href="/xsl-elements/for-each">xsl:for-each</a>
1604
          instruction except that it guarantees to process the selected nodes in order, and the
1605
          results of processing one node can be passed as a parameter to the next iteration, so the
1606
          action applied to one node can influence the way in which subsequent nodes are processed.
1607
          This often provides a solution to the problem that when streaming, you can never "look
1608
          backwards" at preceding nodes. Instead of looking backwards, the information that will be
1609
          needed when processing subsequent nodes can be retained in parameters and "passed
1610
          forwards". Note that streamed nodes themselves cannot be contained in parameters, but data
1611
          derived from those nodes (or copies made using the <code>copy-of()</code> function) can.</li>
1612
        <li>The <a class="bodylink code" href="/xsl-elements/merge">xsl:merge</a> instruction allows
1613
          several input sequences to be merged, based on the value of a sort key. Any or all of the
1614
          input sequences can be streamed documents, provided that they are already correctly sorted
1615
          on the sort key value.</li>
1616
        <li><strong>Accumulators</strong> allow values to be computed "in the background" while a
1617
          streamed document is being read; the final value of the <a class="bodylink code"
1618
            href="/xsl-elements/accumulator">accumulator</a> is available by calling the <a
1619
            class="bodylink code" href="/functions/fn/accumulator-after">accumulator-after()</a>
1620
          function at the end of processing, and intermediate values are also available.
1621
          Accumulators are useful if you want to compute several values during a single processing
1622
          pass of a streamed document (for example, a minimum and maximum of some value). When the
1623
          information to be maintained in the accumulator is complex, it can be useful to hold it in
1624
          a map, which is a new data structure introduced in XSLT 3.0.</li>
1625
        <li>Saxon (from 9.9) supports an additional capability: <em>capturing accumulators</em>.
1626
         By adding the attribute <code>saxon:capture="yes"</code> to an accumulator rule with
1627
          <code>phase="end"</code>, you can tell Saxon to make a snapshot copy of the matched
1628
          element (as if by calling the <code>fn:snapshot</code> function) and the code for computing
1629
          the next value of the accumulator then has full access to this snapshot, which means it is
1630
          no longer constrained to be motionless. You can even keep the snapshot copy directly
1631
          as the value of the accumulator (just write <code>select="."</code>), or you can retain
1632
          all the matched elements (write <code>select="($value, .)"</code>). One way of writing a
1633
          streamed transformation is now to capture all the data you need in accumulators, and
1634
          to process it only when you hit the end of the document.
1635
        </li>
1636
        <li>The <a class="bodylink code" href="/xsl-elements/fork">xsl:fork</a> instruction
1637
          effectively computes several instructions in parallel. In the Saxon implementation, they
1638
          are not actually evaluated in different threads, but they are all executed during a single
1639
          scan of the streamed input document. The outputs produced by each "prong" of the
1640
            <code>xsl:fork</code> instruction are buffered in memory until all prongs have
1641
          completed, and are then assembled in the correct order to form the final result.</li>
1642
        <li><strong>Streamed grouping</strong> is possible using the <a class="bodylink code"
1643
            href="/xsl-elements/for-each-group">xsl:for-each-group</a> instruction, provided that
1644
          one of the options <code>group-adjacent</code>, <code>group-starting-with</code>, or
1645
            <code>group-ending-with</code> is used. There are restrictions on the use of the <a
1646
            class="bodylink code" href="/functions/fn/current-group">current-group()</a> function
1647
          within such an instruction: essentially, it can only be used once, because it is a
1648
          consuming construct.</li>
1649
      </ul>
1650

    
1651

    
1652
      <p>All these facilities are available in Saxon-EE only.</p>
1653

    
1654
    </section>
1655

    
1656
    <section id="streamed-query" title="Streaming in XQuery">
1657
      <h1>Streaming in XQuery</h1>
1658

    
1659
      <aside>Requires Saxon-EE.</aside>
1660

    
1661
      <p>The XQuery specification says nothing on the subject of streamed evaluation; it is left
1662
        entirely to implementations. Saxon-EE supports streaming of XQuery for simple queries, using
1663
        rules similar to those that apply to XSLT.</p>
1664

    
1665
      <p>Simple queries can be streamed by specifying <code>-stream:on</code> on the Saxon-EE
1666
        command line. There is no need to specify anything in the query itself; however, the <a
1667
          class="bodylink code" href="/functions/fn/copy-of">copy-of()</a> and <a
1668
          class="bodylink code" href="/functions/fn/snapshot">snapshot()</a> functions (defined in
1669
        the XSLT 3.0 specification) may be used if streaming is not otherwise possible.</p>
1670

    
1671
      <p>When running a query using the s9api interface, streaming must be requested both when
1672
        compiling the query (<a class="javalink" href="net.sf.saxon.s9api.XQueryCompiler"
1673
          >XQueryCompiler.setStreaming(true)</a>), and when executing it (<a class="javalink"
1674
          href="net.sf.saxon.s9api.XQueryEvaluator">XQueryEvaluator.runStreamed(Source,
1675
          Destination)</a>).</p>
1676

    
1677
      <p>The query should access the streamed input document via the context item, not via the <a
1678
          class="bodylink code" href="/functions/fn/doc">doc()</a> or <a class="bodylink code"
1679
          href="/functions/fn/collection">collection()</a> function, nor using external variables.
1680
        The source document should be supplied in the form of a <code>SAXSource</code> or
1681
          <code>StreamSource</code> object.</p>
1682

    
1683
      <p>If the query is not streamable, this will be reported as a compile-time error.</p>
1684

    
1685
      <p>The conditions for streamability are essentially the same as the rules for the body of the
1686
        <a class="bodylink code" href="/xsl-elements/source-document">xsl:source-document</a>
1687
        instruction when streamed processing is requested using the attribute
1688
        <code>streamable="yes"</code>, as in the XSLT 3.0 specification. For example:</p>
1689

    
1690
      <ol>
1691
        <li>
1692
          <p>Path expressions must use downward selection only.</p>
1693
        </li>
1694
        <li>
1695
          <p>Predicates must be motionless, which means they can reference attributes but not child
1696
            elements of the node being filtered.</p>
1697
        </li>
1698
        <li>
1699
          <p>No construct may make two downward selections. For example, the expression <code>price
1700
              - discount</code> fails because both operands use the child axis to select downwards.
1701
            If necessary, use <a class="bodylink code" href="/functions/fn/copy-of">copy-of()</a> to
1702
            copy a subtree, after which arbitrary selections within the copied subtree become
1703
            possible.</p>
1704
        </li>
1705
        <li>
1706
          <p>A streamed node may not be bound to a variable. This rules out many uses of FLWOR
1707
            expressions.</p>
1708
        </li>
1709
        <li>
1710
          <p>A streamed node must not be passed as an argument to a function call, other than
1711
            built-in function calls.</p>
1712
        </li>
1713
        <li>
1714
          <p>Global variables in the query must not reference the context item.</p>
1715
        </li>
1716
      </ol>
1717

    
1718
      <p>As with XSLT, these restrictions can often be overcome by using the <a
1719
          class="bodylink code" href="/functions/fn/copy-of">copy-of()</a> or <a
1720
          class="bodylink code" href="/functions/fn/snapshot">snapshot()</a> functions, which Saxon
1721
        makes available in XQuery as well as XSLT.</p>
1722

    
1723
    </section>
1724

    
1725
    <section id="configuration-streaming" title="Configuration options for streaming">
1726
      <h1>Configuration options for streaming</h1>
1727

    
1728
      <aside>Requires Saxon-EE.</aside>
1729

    
1730
      <p>Saxon attempts streamed evaluation only if it is explicitly requested. Streaming may be
1731
        requested in a number of ways:</p>
1732

    
1733
      <ul>
1734
        <li>
1735
          <p>By use of XSLT 3.0 language constructs that request streaming, for example the <a
1736
            class="bodylink code" href="/xsl-elements/source-document">xsl:source-document</a>
1737
            instruction with attribute <code>streamable="yes"</code>, or by
1738
            specifying <code>streamable="yes"</code> on <a class="bodylink code"
1739
              href="/xsl-elements/mode"> xsl:mode</a> or <a class="bodylink code"
1740
              href="/xsl-elements/accumulator">xsl:accumulator</a>.</p>
1741
        </li>
1742
        <li>
1743
          <p>By use of a Saxon extension that requests streaming, for example <a
1744
              class="bodylink code" href="/functions/saxon/stream">saxon:stream</a>.</p>
1745
        </li>
1746
        <li>
1747
          <p>By setting the option <code>-stream:on</code> in the XQuery command line, or the
1748
            equivalent API option (for example, in s9api, <a class="javalink"
1749
              href="net.sf.saxon.s9api.XQueryCompiler">XQueryCompiler.setStreaming(true)</a>).</p>
1750
        </li>
1751
      </ul>
1752

    
1753
      <p>There are three configuration options that control how these requests for streaming
1754
        are interpreted:</p>
1755
      
1756
      <ul>
1757
        <li>The configuration option <a class="bodylink code" href="/configuration/config-features"
1758
          >Feature.STREAMABILITY</a> may be set to one of the values "off" or "standard".
1759
          (Releases prior to 9.8 supported a third option, "extended".) With a licensed Saxon-EE
1760
        configuration, the default is "standard", which means that streaming will happen if it
1761
        is requested and if it is feasible. Setting the value to "off" causes Saxon to behave
1762
        as if there is no Saxon-EE license: that is, requests for streaming are effectively
1763
        ignored, and the stylesheet is executed in a non-streaming manner (which means that processing
1764
        of a large document may fail if there is insufficient memory).</li>
1765
        
1766
        <li>The configuration option <a class="bodylink code"
1767
          href="/configuration/config-features">Feature.STREAMING_FALLBACK</a> determines what
1768
          Saxon does when streaming is requested, and a construct is found that is deemed
1769
          non-streamable. This is a boolean option. If it is set to <code>true</code>, Saxon attempts a
1770
          non-streaming implementation of the relevant construct. If sufficient memory is available
1771
          for a non-streaming evaluation, this should always give the same result as a streamed
1772
          evaluation. When the option is set to <code>false</code> (the default), the presence of a 
1773
          construct that is deemed non-streamable causes a static (compile-time) error.</li>
1774
        
1775
        <li>The configuration option <a class="bodylink code"
1776
          href="/configuration/config-features">Feature.STRICT_STREAMABILITY</a>
1777
         determines how closely Saxon's streamability analysis follows the rules in the
1778
        W3C specification. This is a boolean value (with the default <code>false</code>): the value <code>true</code> requests
1779
        strict adherence to the W3C rules. In reality this option does not affect the rules
1780
        that Saxon applies, rather it affects when they are applied. By default Saxon first performs
1781
        all its usual compile-time optimizations to the expression tree, and then checks the final result
1782
        for streamability. During the optimization process Saxon takes care to avoid replacing streamable
1783
        constructs with non-streamable equivalents, but it may do the reverse. As a result, constructs
1784
        that are not streamable according to the W3C rules may become streamable after optimization.
1785
        (An example is the non-streamable expression <code>AUTHOR or EDITOR</code>, which Saxon rewrites
1786
          in the streamable form <code>exists(AUTHOR | EDITOR)</code>.)
1787
        For interoperability, the W3C specification requires processors to provide a mode of operation in
1788
        which the W3C streamability rules are enforced rigidly, and this is achieved by setting
1789
        <code>STRICT_STREAMABILITY</code> to <code>true</code>. With this setting, Saxon checks the
1790
        expression tree for streamability <em>before</em> doing any optimizations that change
1791
        the tree.</li>
1792
      </ul>
1793
        
1794
 
1795

    
1796
      <p>When running from the command line these options can be set for example as
1797
          <code>--streamability:off</code> or <code>--streamingFallback:on</code>.</p>
1798
    </section>
1799

    
1800
 
1801

    
1802
    <section id="burst-mode-streaming" title="Burst-mode streaming">
1803
      <h1>Burst-mode streaming</h1>
1804

    
1805
      <aside>Requires Saxon-EE.</aside>
1806

    
1807

    
1808
      <p>Burst-mode streaming takes a streamed document as input, and generates a sequence of small
1809
        subtrees containing the parts of the document that need to be processed. This can be
1810
        achieved using XSLT 3.0 syntax like this:</p>
1811

    
1812
      <samp><![CDATA[<xsl:source-document streamable="yes" href="employees.xml">
1813
  <xsl:apply-templates select="*/employee/copy-of(.)"/>  
1814
</xsl:source-document>
1815
]]></samp>
1816

    
1817
      <p>The code that processes an individual <code>employee</code> element does not need to be
1818
        streamable; it can use any XSLT constructs. The only constraint is that it cannot navigate
1819
        outside the <code>employee</code> element: because the <code>employee</code> element is a
1820
        copy of a subtree from the orginal document, it has no parent or siblings.</p>
1821

    
1822
      <p>Burst-mode streaming can also be applied to the principal input of the transformation. This
1823
        works if the transformation is run from the command line, and also if it is executed from a
1824
        Java or .NET API provided that the document is supplied as a streamed source object, not as
1825
        a pre-built tree (under Java, this means a <code>StreamSource</code> or
1826
          <code>SAXSource</code>). For example:</p>
1827

    
1828
      <samp><![CDATA[<xsl:mode streamable="yes"/>
1829
<xsl:template match="/">
1830
  <xsl:apply-templates select="*/employee/copy-of(.)"/>  
1831
</xsl:template>
1832
]]></samp>
1833

    
1834
      <p>The same effect can be achieved in XQuery if the document is supplied as the initial
1835
        context item, again in the form of a streamed input source. Although the functions
1836
          <code>copy-of()</code> and <code>snapshot()</code> are defined in the XSLT 3.0
1837
        specification, Saxon also makes them available in XQuery, allowing for example:</p>
1838

    
1839
      <samp><![CDATA[*/employee ! copy-of(.)/(name, address)
1840
]]></samp>
1841

    
1842
      <p>In XQuery there is no need for the query itself to indicate that streamed execution is
1843
        required; rather this can be requested from the command line using the option
1844
          <code>-stream:on</code>. </p>
1845

    
1846
      <p>The same effect can be achieved on external streamed documents using the <a
1847
          class="bodylink code" href="/functions/saxon/stream">saxon:stream</a> extension
1848
        function.</p>
1849

    
1850

    
1851

    
1852
      <h2 class="subtitle">Example: selective copying</h2>
1853

    
1854
      <p>A very simple way of using burst mode streaming is when making a selective copy of parts of
1855
        a document. For example, the following code creates an output document containing all the
1856
          <code>footnote</code> elements from the source document that have the attribute
1857
          <code>@type='endnote'</code>:</p>
1858

    
1859
      <p>
1860
        <strong>XSLT example (named document)</strong>
1861
      </p>
1862
      <samp><![CDATA[<xsl:template name="main">
1863
  <footnotes>
1864
    <xsl:source-document streamable="yes" href="thesis.xml">
1865
      <xsl:copy-of select=".//footnote[@type='endnote'])"/>
1866
    </xsl:source-document>  
1867
  </footnotes>
1868
</xsl:template>
1869
]]></samp>
1870

    
1871
      <p>
1872
        <strong>XQuery example (named document)</strong>
1873
      </p>
1874
      <samp><![CDATA[  <footnotes>{
1875
     saxon:stream(doc('thesis.xml')//footnote[@type='endnote']) 
1876
  }</footnotes>
1877
]]></samp>
1878

    
1879
      <p>
1880
        <strong>XSLT example (principal input document)</strong>
1881
      </p>
1882
      <samp><![CDATA[<xsl:mode streamable="yes"/>
1883
<xsl:template match="/">
1884
  <footnotes>
1885
    <xsl:copy-of select=".//footnote[@type='endnote'])"/>
1886
  </footnotes>
1887
</xsl:template>
1888
]]></samp>
1889

    
1890
      <p>
1891
        <strong>XQuery example (principal input document)</strong>
1892
      </p>
1893
      <samp><![CDATA[  <footnotes>{.//footnote[@type='endnote']}</footnotes>
1894
]]></samp>
1895

    
1896

    
1897
      <p>These examples work because the predicate (the expression in square brackets) is
1898
          <i>motionless</i> - evaluating the predicate does not require the source document to be
1899
        repositioned. If the predicate needs access to child elements rather than attributes, it's
1900
        necessary to make a copy of each footnote and then test the copy. The last example then
1901
        becomes:</p>
1902

    
1903
      <samp><![CDATA[  <footnotes>{.//footnote/copy-of(.)[type='endnote']}</footnotes>
1904
]]></samp>
1905
    </section>
1906

    
1907

    
1908

    
1909
    <section id="partial-reading" title="Reading source documents partially">
1910
      <h1>Reading source documents partially</h1>
1911

    
1912
      <aside>Requires Saxon-EE.</aside>
1913

    
1914

    
1915
      <p>As well as allowing a source document to be processed in a single sequential pass, the
1916
        streaming facility in many cases allows the source document to be read only partially. For
1917
        example, the following query will return true as soon as it finds a transaction with a
1918
        negative value, and will then immediately stop processing the input file:</p>
1919
      <samp><![CDATA[some $t in saxon:stream(doc('big-transaction-file.xml')//transaction)
1920
satisfies number($t/@value) lt 0
1921
]]></samp>
1922

    
1923
      <p>This facility is particularly useful for extracting data that appears near the start of a
1924
        large file. It does mean, however, that well-formedness or validity errors appearing later
1925
        in the file will not necessarily be detected.</p>
1926

    
1927
      <p>To exit early from reading a streamed document using pure XSLT 3.0 constructs, use <a
1928
          href="/xsl-elements/iterate" class="bodylink code">xsl:iterate</a> like this:</p>
1929

    
1930
      <samp><![CDATA[<xsl:variable name="contains-debit" as="xs:boolean">
1931
  <xsl:source-document streamable="yes" href="big-transaction-file.xml">
1932
    <xsl:iterate select=".//transaction">
1933
      <xsl:if test="@value lt 0">
1934
        <xsl:break select="true()"/>
1935
      </xsl:if>
1936
      <xsl:on-completion select="false()"/>
1937
    </xsl:iterate>
1938
  </xsl:source-document>
1939
</xsl:variable>
1940
]]></samp>
1941

    
1942
    </section>
1943

    
1944

    
1945

    
1946
    <section id="stream-with-iterate" title="Streaming with xsl:iterate">
1947
      <h1>Streaming with xsl:iterate</h1>
1948

    
1949
      <aside>Requires Saxon-EE.</aside>
1950

    
1951
      <p>In the examples given above, streaming is used to select a sequence of element nodes from
1952
        the source document, and each of these nodes is then processed independently. In cases where
1953
        the processing of one node depends in some way on previous nodes, it is possible to use <a
1954
          class="bodylink" href="../burst-mode-streaming">burst-mode streaming</a> in conjunction
1955
        with the new <a href="/xsl-elements/iterate" class="bodylink code">xsl:iterate</a>
1956
        instruction in XSLT 3.0.</p>
1957

    
1958
      <p>The following example takes a sequence of <code>&lt;transaction&gt;</code> elements in an
1959
        input document, each one containing the value of a debit or credit from an account. As
1960
        output it copies the transaction elements, adding a current balance.</p>
1961
      <samp><![CDATA[    <xsl:source-document streamable="yes" href="transactions.xml">          
1962
      <xsl:iterate select="account/transaction">
1963
        <xsl:param name="balance" as="xs:decimal" select="0.00"/>
1964
        <xsl:variable name="new-balance" as="xs:decimal" select="$balance + xs:decimal(@value)"/>
1965
        <transaction balance="{$new-balance}">
1966
           <xsl:copy-of select="@*"/>
1967
        </transaction>
1968
        <xsl:next-iteration>
1969
          <xsl:with-param name="balance" select="$new-balance"/>
1970
        </xsl:next-iteration>
1971
      </xsl:iterate>
1972
    </xsl:source-document>  
1973
]]></samp>
1974

    
1975
      <p>The following example is similar: this time it copies the account number (contained in a
1976
        separate element at the start of the file) into each transaction element:</p>
1977
      <samp><![CDATA[    <xsl:source-document streamable="yes" href="transactions.xml">           
1978
      <xsl:iterate select="account/(account-number|transaction)">
1979
        <xsl:param name="accountNr"/>
1980
        <xsl:choose>
1981
           <xsl:when test="self::account-number">
1982
             <xsl:next-iteration>
1983
                <xsl:with-param name="accountNr" select="string(.)"/>
1984
             </xsl:next-iteration>
1985
           </xsl:when>
1986
           <xsl:otherwise>
1987
             <transaction account-number="{$accountNr}">
1988
               <xsl:copy-of select="@*"/>
1989
             </transaction>
1990
           </xsl:otherwise>
1991
        </xsl:choose>
1992
      </xsl:iterate>
1993
    </xsl:source-document>  
1994
]]></samp>
1995

    
1996
      <p>Here is a more complex example, one that groups adjacent transaction elements having the
1997
        same date attribute. The two loop parameters are the current grouping key and the current
1998
        date. The contents of a group are accumulated in a variable until the date changes.</p>
1999
      <samp><![CDATA[    <xsl:source-document streamable="yes" href="transactions.xml">           
2000
      <xsl:iterate select="account/transaction">
2001
        <xsl:param name="group" as="element(transaction)*" select="()"/>
2002
        <xsl:param name="currentDate" as="xs:date?" select="()"/>
2003
        <xsl:choose>
2004
          <xsl:when test="xs:date(@date) eq $currentDate or empty($group)">
2005
            <xsl:next-iteration>
2006
              <xsl:with-param name="currentDate" select="@date"/>
2007
              <xsl:with-param name="group" select="($group, .)"/>
2008
            </xsl:next-iteration>
2009
          </xsl:when>
2010
          <xsl:otherwise>
2011
            <daily-transactions date="{$currentDate}">
2012
              <xsl:copy-of select="$group"/>
2013
            </daily-transactions>
2014
            <xsl:next-iteration>
2015
              <xsl:with-param name="group" select="."/>
2016
              <xsl:with-param name="currentDate" select="@date"/>
2017
            </xsl:next-iteration>            
2018
          </xsl:otherwise>
2019
        </xsl:choose>
2020
        <xsl:on-completion>
2021
          <final-daily-transactions date="{$currentDate}">
2022
            <xsl:copy-of select="$group"/>
2023
          </final-daily-transactions>
2024
        </xsl:on-completion>        
2025
      </xsl:iterate>
2026
    </xsl:source-document>  
2027
]]></samp>
2028

    
2029
      <p>Note that when an <a class="bodylink code" href="/xsl-elements/iterate">xsl:iterate</a>
2030
        loop is terminated using <a class="bodylink code" href="/xsl-elements/break">xsl:break</a>,
2031
        parsing of the source document will be abandoned. This provides a convenient way to read
2032
        data near the start of a large file without incurring the cost of reading the entire
2033
        file.</p>
2034
    </section>
2035

    
2036
    <section id="stream-with-merge" title="Streaming with xsl:merge">
2037
      <h1>Streaming with xsl:merge</h1>
2038

    
2039
      <aside>Requires Saxon-EE.</aside>
2040

    
2041
      <p>Saxon (since 9.6) allows several streamed inputs to be merged using the new XSLT 3.0 <a
2042
          href="/xsl-elements/merge" class="bodylink code">xsl:merge</a> instruction. For this to
2043
        work, there are a number of rules to follow:</p>
2044

    
2045
      <ol>
2046
        <li>
2047
          <p>Streaming must be requested by specifying <code>streamable="yes"</code> on the <a
2048
              class="bodylink code" href="/xsl-elements/merge-source">xsl:merge-source</a>
2049
            element.</p>
2050
        </li>
2051
        <li>
2052
          <p>When streaming is requested, the <code>for-each-source</code> attribute of
2053
              <code>xsl:merge-source</code> must be present, and must be a single string.</p>
2054
        </li>
2055
        <li>
2056
          <p>The <code>select</code> attribute on the <code>xsl:merge-source</code> element must
2057
            take the form of a motionless pattern.</p>
2058
        </li>
2059
      </ol>
2060

    
2061
      <p>For each node selected by the <code>select</code> expression, Saxon takes an implicit
2062
        snapshot (in the sense of the XSLT 3.0 <a class="bodylink code"
2063
          href="/functions/fn/snapshot">fn:snapshot()</a> function). The merge keys are evaluated in
2064
        relation to this snapshot, and it is this snapshot that is presented within the
2065
          <code>xsl:merge-action</code> construct as the result of the <a class="bodylink code"
2066
          href="/functions/fn/current-merge-group">fn:current-merge-group()</a> function.</p>
2067

    
2068
      <p>Here is an example of streamed merging of two log files:</p>
2069

    
2070
      <samp><![CDATA[<xsl:merge>
2071
  <xsl:merge-source streamable="yes"
2072
       for-each-source="'log-file-1.xml'" select="events/event">
2073
    <xsl:merge-key select="xs:dateTime(@timestamp)"/>
2074
  </xsl:merge-source>
2075
  <xsl:merge-source streamable="yes"
2076
       for-each-source="'log-file-2.xml'" select="log/day/record">
2077
    <xsl:merge-key select="dateTime(../@date, time)"/>
2078
  </xsl:merge-source>
2079
  <xsl:merge-action>
2080
    <group>
2081
      <xsl:copy-of select="current-merge-group()" />
2082
    </group>
2083
  </xsl:merge-action>
2084
</xsl:merge>]]></samp>
2085
    </section>
2086

    
2087

    
2088
    <section id="streaming-templates" title="Streaming Templates">
2089
      <h1>Streaming Templates</h1>
2090

    
2091
      <aside>Requires Saxon-EE.</aside>
2092

    
2093
      <p>Streaming templates allow a document to be processed hierarchically in the classical XSLT
2094
        style, applying template rules to each element (or other nodes) in a top-down manner, while
2095
        scanning the source document in a pure streaming fashion, without building the source tree
2096
        in memory. Saxon-EE allows streamed processing of a document using template rules, provided
2097
        the templates conform to a set of strict guidelines.</p>
2098

    
2099
      <p>Streaming in this way is a property of a <strong>mode</strong>; a mode can be declared to
2100
        be streamable, and if it is so declared, then all template rules using that mode must obey
2101
        the rules for streamability. A mode is declared to be streamable using the top-level
2102
        stylesheet declaration:</p>
2103

    
2104
      <samp><![CDATA[<xsl:mode name="s" streamable="yes"/>]]></samp>
2105

    
2106
      <p>The <code>name</code> attribute is optional; if omitted, the declaration applies to the
2107
        default (unnamed) mode.</p>
2108

    
2109
      <p>Streamed processing of a source document can be applied either to the principal source
2110
        document of the transformation, or to a secondary source document read using the <a
2111
          class="bodylink code" href="/xsl-elements/source-document">xsl:source-document</a>
2112
        instruction.</p>
2113

    
2114
      <p>To use streaming on the principal source document, the input to the transformation must be
2115
        supplied in the form of a <code>StreamSource</code> or <code>SAXSource</code>, and the
2116
        initial mode selected on entry to the transformation must be a streamable mode. In this case
2117
        there must be no references to the context item in the initializer of any global
2118
        variable.</p>
2119

    
2120
      <p>Streamed processing of a secondary document is initiated using the instruction:</p>
2121

    
2122
      <samp><![CDATA[<xsl:source-document streamable="yes" href="abc.xml">
2123
  <xsl:apply-templates mode="s"/>
2124
</xsl:source-document>]]></samp>
2125

    
2126
      <p>Saxon will also recognize an instruction of the form:</p>
2127

    
2128
      <samp><![CDATA[<xsl:apply-templates select="doc('abc.xml')" mode="s"/>]]></samp>
2129

    
2130
      <p>Here the <code>select</code> attribute must contain a simple call on the <a
2131
          class="bodylink code" href="/functions/fn/doc">doc()</a> or <a class="bodylink code"
2132
          href="/functions/fn/document">document()</a> function, and the mode (explicit or implicit)
2133
        must be declared as streamable. The call on <code>doc()</code> or <code>document()</code>
2134
        can be extended with a streamable selection path, for example
2135
          <code>select="doc('employee.xml')/*/employee"</code>.</p>
2136

    
2137
      <p>If a mode is declared as streamable, then it must ONLY be used in streaming mode; it is not
2138
        possible to apply templates using a streaming mode if the selected nodes are ordinary
2139
        non-streamed nodes. </p>
2140

    
2141
      <p>Every template rule within a streamable mode must follow strict rules to ensure it can be
2142
        processed in a streaming manner. The essence of these rules is:</p>
2143
      <ol>
2144
        <li>
2145
          <p>The match pattern for the template rule must be a simple pattern that can be evaluated
2146
            when positioned at the start tag of an element, without repositioning the stream (but
2147
            information about the ancestors of the element and their attributes is available,
2148
            together with some limited information about their position relative to their siblings).
2149
            Examples of acceptable patterns are <code>*</code>, <code>para</code>,
2150
              <code>para[1]</code>, or <code>para/*</code>.</p>
2151
          <p>If the match pattern includes a boolean predicate, then the predicate must be
2152
            "motionless", which means that it can be evaluated while the input stream is positioned
2153
            at the start tag. This means it can reference properties such as <code>name()</code> and
2154
              <code>base-uri()</code>, and can reference attributes of the element, but cannot
2155
            reference its children or content.</p>
2156
          <p>If the match pattern includes a numeric predicate, then it must be possible to evaluate
2157
            this by counting either the total number of preceding-sibling elements, or the number of
2158
            preceding siblings with a given name. Examples of permitted patterns include
2159
              <code>*[1]</code>, <code>p[3]</code>, and <code>*:p[2][@class='bold']</code>;
2160
            disallowed patterns include <code>(descendant::fig)[1]</code>,
2161
              <code>p[@class='bold'][2]</code>, and <code>p[last()]</code>.</p>
2162
        </li>
2163
        <li>
2164
          <p> The body of the template rule must contain at most one expression or instruction that
2165
            reads the contents below the matched element (that is, children or descendants), and it
2166
            must process the contents in document order. This expression or instruction will often
2167
            be one of the following:</p>
2168
          <ul>
2169
            <li>
2170
              <p>
2171
                <code>&lt;xsl:apply-templates/&gt;</code>
2172
              </p>
2173
            </li>
2174
            <li>
2175
              <p>
2176
                <code>&lt;xsl:value-of select="."/&gt;</code>
2177
              </p>
2178
            </li>
2179
            <li>
2180
              <p>
2181
                <code>&lt;xsl:copy-of select="."/&gt;</code>
2182
              </p>
2183
            </li>
2184
            <li>
2185
              <p>
2186
                <code>string(.)</code>
2187
              </p>
2188
            </li>
2189
            <li>
2190
              <p><code>data(.)</code> (explicitly or implicitly)</p>
2191
            </li>
2192
          </ul>
2193
          <p>but this list is not exhaustive. It is possible to process the contents selectively by
2194
            using a streamable path expression, for example:</p>
2195
          <ul>
2196
            <li>
2197
              <p>
2198
                <code>&lt;xsl:apply-templates select="foo"/&gt;</code>
2199
              </p>
2200
            </li>
2201
            <li>
2202
              <p>
2203
                <code>&lt;xsl:value-of select="a/b/c"/&gt;</code>
2204
              </p>
2205
            </li>
2206
            <li>
2207
              <p>
2208
                <code>&lt;xsl:copy-of select="x/y"/&gt;</code>
2209
              </p>
2210
            </li>
2211
          </ul>
2212
          <p>but this effectively means that the content not selected by this path is skipped
2213
            entirely; the transformation ignores it.</p>
2214
          <p>The template can access attributes of the context item without restriction, as well as
2215
            properties such as its <code>name()</code>, <code>local-name()</code>, and
2216
              <code>base-uri()</code>. It can also access the ancestors of the context item, the
2217
            attributes of the ancestors, and properties such as the name of an ancestor; but having
2218
            navigated to an ancestor, it cannot then navigate downwards or sideways, since the
2219
            siblings and the other descendants of the ancestor are not available while
2220
            streaming.</p>
2221
          <p>The restriction that only one downwards access is allowed makes it an error to use an
2222
            expression such as <code>price - discount</code> in a streamable template. This problem
2223
            can often be circumvented by making a copy of the context item. This can be done using
2224
            the <code>copy-of()</code> function: for example <code>&lt;xsl:value-of
2225
              select="copy-of(.)/(price - discount)"/&gt;</code>. Taking a copy of the context node
2226
            requires memory, of course, and should be avoided unless the contents of the node are
2227
            small.</p>
2228

    
2229
          <p>Certain constructs using positional filters can be evaluated in streaming mode. For
2230
            example, it is possible to use <code>&lt;xsl:apply-templates select="*[1]"/&gt;</code>.
2231
            The filter must be on a node test that uses the child axis and selects element nodes.
2232
            The forms accepted are expressions that can be expressed as <code>x[position() op
2233
              N]</code> where <code>N</code> is an expression that is independent of the focus and
2234
            is statically known to evaluate to a number, <code>x</code> is a node test using the
2235
            child axis, and <code>op</code> is one of the operators <code>eq</code>,
2236
            <code>le</code>, <code>lt</code>, <code>gt</code>, or <code>ge</code>. Alternative forms
2237
            of this construct such as <code>x[N]</code>, <code>remove(x, 1)</code>,
2238
              <code>head(x)</code>, <code>tail(x)</code>, and <code>subsequence(x, 1, N)</code> are
2239
            also accepted.</p>
2240
        </li>
2241
      </ol>
2242

    
2243
    </section>
2244
  </section>
2245
  <section id="projection" title="Document Projection">
2246
    <h1>Document Projection</h1>
2247

    
2248
    <aside>Document projection is available only in Saxon-EE.</aside>
2249

    
2250

    
2251
    <p>Document Projection is a mechanism that analyzes a query to determine what parts of a
2252
      document it can potentially access, and then while building a tree to represent the document,
2253
      leaves out those parts of the tree that cannot make any difference to the result of the
2254
      query.</p>
2255

    
2256
    <p>Document projection can be enabled as an option on the XQuery command line interface: set
2257
        <code>-projection:on</code>. It is only used if requested. The command line option affects
2258
      both the primary source document supplied on the command line, and any calls on the
2259
        <code>doc()</code> function within the body of the query that use a literal string argument
2260
      for the document URI.</p>
2261

    
2262
    <p>For feedback on the impact of document projection in terms of reducing the size of the source
2263
      document in memory, use the <code>-t</code> option on the command line, which shows for each
2264
      document loaded how many nodes from the input document were retained and how many
2265
      discarded.</p>
2266

    
2267
    <p>From the s9api API, document projection can be invoked as an option on the <a
2268
        class="javalink" href="net.sf.saxon.s9api.DocumentBuilder">DocumentBuilder</a>. The call
2269
        <code>setDocumentProjectionQuery()</code> supplies as its argument a compiled query (an
2270
        <code>XQueryExecutable</code>), and the document built by the document builder is then
2271
      projected to retain only the parts of the document that are accessed by this query, when it
2272
      operates on this document as the initial context item. For example, if the supplied query is
2273
        <code>count(//ITEM)</code>, then only the <code>ITEM</code> elements will be retained.</p>
2274

    
2275
    <p>It is also possible to request that a query should perform document projection on documents
2276
      that it reads using the <code>doc()</code> function, provided this has a string-literal
2277
      argument. This can be requested using the option <code>setAllowDocumentProjection(true)</code>
2278
      on the <code>XQueryExpression</code> object. This is not available directly in the s9api
2279
      interface, but the <code>XQueryExpression</code> is reachable from the
2280
        <code>XQueryExecutable</code> using the accessor method
2281
        <code>getUnderlyingCompiledQuery()</code>.</p>
2282
    <aside>It is best to avoid supplying a query that actually returns nodes from the document
2283
      supplied as the context item, since the analysis cannot know what the invoker of the query
2284
      will want to do with these nodes. For example, the query
2285
        <code>&lt;out&gt;{//ITEM}&lt;/out&gt;</code> works better than <code>//ITEM</code>, since it
2286
      is clear that all descendants of the <code>ITEM</code> elements must be retained, but not
2287
      their ancestors. If the supplied query selects nodes from the input document, then Saxon
2288
      assumes that the application will need access to the entire subtree rooted at these nodes, but
2289
      that it will not attempt to navigate upwards or outwards from these nodes. On the other hand,
2290
      nodes that are atomized (for example in a filter) will be retained without their descendants,
2291
      except as needed to compute the filter.</aside>
2292

    
2293
    <p>The more complex the query, the less likely it is that Saxon will be able to analyze it to
2294
      determine the subset of the document required. If precise analysis is not possible, document
2295
      projection has no effect. Currently Saxon makes no attempt to analyze accesses made within
2296
      user-defined functions. Also, of course, Saxon cannot analyze the expectations of external
2297
      (Java) functions called from the query.</p>
2298

    
2299
    <p>Document projection is supported only for XQuery, and it works only when a document
2300
      is parsed and loaded for the purpose of executing a single query. It is possible, however, to
2301
      use the mechanism to create a manual filter for source documents if the required subset of the
2302
      document is known. To achieve this, create a query that selects the required parts of the
2303
      document supplied as the context item, and compile it to a s9api
2304
      <code>XQueryExecutable</code>. The query does not have to do anything useful: the only
2305
      requirement is that the result of the query on the subset document must be the same as the
2306
      result on the original document. Then supply this <code>XQueryExecutable</code> to the s9api
2307
        <code>DocumentBuilder</code> used to build the document.</p>
2308

    
2309
    <p>Of course, when document projection is used manually like this then it is entirely a user
2310
      responsibility to ensure that the selected part of the document contains all the nodes
2311
      required.</p>
2312
  </section>
2313
  <section id="w3c-dtds" title="References to W3C DTDs">
2314
    <h1>References to W3C DTDs</h1>
2315

    
2316

    
2317

    
2318
    <p>During 2010-11, W3C took steps to reduce the burden of meeting requests for
2319
      commonly-referenced documents such as the DTD for XHTML. The W3C web server routinely
2320
      adds an artificial 30-second time delay for such requests. In response to this, Saxon now includes
2321
      copies of these documents within the issued JAR file, and recognizes requests for these
2322
      documents, satisfying the request using the local copy.</p>
2323

    
2324
    <p>This is done only in cases where Saxon itself instantiates the XML parser. In cases where the
2325
      user application instantiates an XML parser, the same effect can be achieved by setting the <a
2326
        class="javalink" href="net.sf.saxon.lib.StandardEntityResolver">StandardEntityResolver</a>
2327
      as a property of the <code>XMLReader</code> (parser).</p>
2328

    
2329
    <p>The documents recognized by the <code>StandardEntityResolver</code> are:</p>
2330

    
2331
    <table>
2332
      <thead>
2333
        <tr>
2334
          <td>
2335
            <p>Public ID</p>
2336
          </td>
2337
          <td>
2338
            <p>System ID</p>
2339
          </td>
2340
          <td>
2341
            <p>Saxon resource name</p>
2342
          </td>
2343
        </tr>
2344
      </thead>
2345
      <tbody>
2346
        <tr>
2347
          <td>
2348
            <p>-//W3C//ENTITIES Latin 1 for XHTML//EN</p>
2349
          </td>
2350
          <td>
2351
            <p>http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent</p>
2352
          </td>
2353
          <td>
2354
            <p>w3c/xhtml-lat1.ent</p>
2355
          </td>
2356
        </tr>
2357
        <tr>
2358
          <td>
2359
            <p>-//W3C//ENTITIES Symbols for XHTML//EN</p>
2360
          </td>
2361
          <td>
2362
            <p>http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent</p>
2363
          </td>
2364
          <td>
2365
            <p>w3c/xhtml-symbol.ent</p>
2366
          </td>
2367
        </tr>
2368
        <tr>
2369
          <td>
2370
            <p>-//W3C//ENTITIES Special for XHTML//EN</p>
2371
          </td>
2372
          <td>
2373
            <p>http://www.w3.org/TR/xhtml1/DTD/xhtml-special.ent</p>
2374
          </td>
2375
          <td>
2376
            <p>w3c/xhtml-special.ent</p>
2377
          </td>
2378
        </tr>
2379
        <tr>
2380
          <td>
2381
            <p>-//W3C//DTD XHTML 1.0 Transitional//EN</p>
2382
          </td>
2383
          <td>
2384
            <p>http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd</p>
2385
          </td>
2386
          <td>
2387
            <p>w3c/xhtml10/xhtml1-transitional.dtd</p>
2388
          </td>
2389
        </tr>
2390
        <tr>
2391
          <td>
2392
            <p>-//W3C//DTD XHTML 1.0 Strict//EN</p>
2393
          </td>
2394
          <td>
2395
            <p>http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd</p>
2396
          </td>
2397
          <td>
2398
            <p>w3c/xhtml10/xhtml1-strict.dtd</p>
2399
          </td>
2400
        </tr>
2401
        <tr>
2402
          <td>
2403
            <p>-//W3C//DTD XHTML 1.0 Frameset//EN</p>
2404
          </td>
2405
          <td>
2406
            <p>http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd</p>
2407
          </td>
2408
          <td>
2409
            <p>w3c/xhtml10/xhtml1-frameset.dtd</p>
2410
          </td>
2411
        </tr>
2412
        <tr>
2413
          <td>
2414
            <p>-//W3C//DTD XHTML Basic 1.0//EN</p>
2415
          </td>
2416
          <td>
2417
            <p>http://www.w3.org/TR/xhtml-basic/xhtml-basic10.dtd</p>
2418
          </td>
2419
          <td>
2420
            <p>w3c/xhtml10/xhtml-basic10.dtd</p>
2421
          </td>
2422
        </tr>
2423
        <tr>
2424
          <td>
2425
            <p>-//W3C//DTD XHTML 1.1//EN</p>
2426
          </td>
2427
          <td>
2428
            <p>http://www.w3.org/MarkUp/DTD/xhtml11.dtd</p>
2429
          </td>
2430
          <td>
2431
            <p>w3c/xhtml11/xhtml11.dtd</p>
2432
          </td>
2433
        </tr>
2434
        <tr>
2435
          <td>
2436
            <p>-//W3C//DTD XHTML Basic 1.1//EN</p>
2437
          </td>
2438
          <td>
2439
            <p>http://www.w3.org/MarkUp/DTD/xhtml-basic11.dtd</p>
2440
          </td>
2441
          <td>
2442
            <p>w3c/xhtml11/xhtml-basic11.dtd</p>
2443
          </td>
2444
        </tr>
2445
        <tr>
2446
          <td>
2447
            <p>-//W3C//ELEMENTS XHTML Access Element 1.0//EN</p>
2448
          </td>
2449
          <td>
2450
            <p>http://www.w3.org/MarkUp/DTD/xhtml-access-1.mod</p>
2451
          </td>
2452
          <td>
2453
            <p>w3c/xhtml11/xhtml-access-1.mod</p>
2454
          </td>
2455
        </tr>
2456
        <tr>
2457
          <td>
2458
            <p>-//W3C//ENTITIES XHTML Access Attribute Qnames 1.0//EN</p>
2459
          </td>
2460
          <td>
2461
            <p>http://www.w3.org/MarkUp/DTD/xhtml-access-qname-1.mod</p>
2462
          </td>
2463
          <td>
2464
            <p>w3c/xhtml11/xhtml-access-qname-1.mod</p>
2465
          </td>
2466
        </tr>
2467
        <tr>
2468
          <td>
2469
            <p>-//W3C//ELEMENTS XHTML Java Applets 1.0//EN</p>
2470
          </td>
2471
          <td>
2472
            <p>http://www.w3.org/MarkUp/DTD/xhtml-applet-1.mod</p>
2473
          </td>
2474
          <td>
2475
            <p>w3c/xhtml11/xhtml-applet-1.mod</p>
2476
          </td>
2477
        </tr>
2478
        <tr>
2479
          <td>
2480
            <p>-//W3C//ELEMENTS XHTML Base Architecture 1.0//EN</p>
2481
          </td>
2482
          <td>
2483
            <p>http://www.w3.org/MarkUp/DTD/xhtml-arch-1.mod</p>
2484
          </td>
2485
          <td>
2486
            <p>w3c/xhtml11/xhtml-arch-1.mod</p>
2487
          </td>
2488
        </tr>
2489
        <tr>
2490
          <td>
2491
            <p>-//W3C//ENTITIES XHTML Common Attributes 1.0//EN</p>
2492
          </td>
2493
          <td>
2494
            <p>http://www.w3.org/MarkUp/DTD/xhtml-attribs-1.mod</p>
2495
          </td>
2496
          <td>
2497
            <p>w3c/xhtml11/xhtml-attribs-1.mod</p>
2498
          </td>
2499
        </tr>
2500
        <tr>
2501
          <td>
2502
            <p>-//W3C//ELEMENTS XHTML Base Element 1.0//EN</p>
2503
          </td>
2504
          <td>
2505
            <p>http://www.w3.org/MarkUp/DTD/xhtml-base-1.mod</p>
2506
          </td>
2507
          <td>
2508
            <p>w3c/xhtml11/xhtml-base-1.mod</p>
2509
          </td>
2510
        </tr>
2511
        <tr>
2512
          <td>
2513
            <p>-//W3C//ELEMENTS XHTML Basic Forms 1.0//EN</p>
2514
          </td>
2515
          <td>
2516
            <p>http://www.w3.org/MarkUp/DTD/xhtml-basic-form-1.mod</p>
2517
          </td>
2518
          <td>
2519
            <p>w3c/xhtml11/xhtml-basic-form-1.mod</p>
2520
          </td>
2521
        </tr>
2522
        <tr>
2523
          <td>
2524
            <p>-//W3C//ELEMENTS XHTML Basic Tables 1.0//EN</p>
2525
          </td>
2526
          <td>
2527
            <p>http://www.w3.org/MarkUp/DTD/xhtml-basic-table-1.mod</p>
2528
          </td>
2529
          <td>
2530
            <p>w3c/xhtml11/xhtml-basic-table-1.mod</p>
2531
          </td>
2532
        </tr>
2533
        <tr>
2534
          <td>
2535
            <p>-//W3C//ENTITIES XHTML Basic 1.0 Document Model 1.0//EN</p>
2536
          </td>
2537
          <td>
2538
            <p>http://www.w3.org/MarkUp/DTD/xhtml-basic10-model-1.mod</p>
2539
          </td>
2540
          <td>
2541
            <p>w3c/xhtml11/xhtml-basic10-model-1.mod</p>
2542
          </td>
2543
        </tr>
2544
        <tr>
2545
          <td>
2546
            <p>-//W3C//ENTITIES XHTML Basic 1.1 Document Model 1.0//EN</p>
2547
          </td>
2548
          <td>
2549
            <p>http://www.w3.org/MarkUp/DTD/xhtml-basic11-model-1.mod</p>
2550
          </td>
2551
          <td>
2552
            <p>w3c/xhtml11/xhtml-basic11-model-1.mod</p>
2553
          </td>
2554
        </tr>
2555
        <tr>
2556
          <td>
2557
            <p>-//W3C//ELEMENTS XHTML BDO Element 1.0//EN</p>
2558
          </td>
2559
          <td>
2560
            <p>http://www.w3.org/MarkUp/DTD/xhtml-bdo-1.mod</p>
2561
          </td>
2562
          <td>
2563
            <p>w3c/xhtml11/xhtml-bdo-1.mod</p>
2564
          </td>
2565
        </tr>
2566
        <tr>
2567
          <td>
2568
            <p>-//W3C//ELEMENTS XHTML Block Phrasal 1.0//EN</p>
2569
          </td>
2570
          <td>
2571
            <p>http://www.w3.org/MarkUp/DTD/xhtml-blkphras-1.mod</p>
2572
          </td>
2573
          <td>
2574
            <p>w3c/xhtml11/xhtml-blkphras-1.mod</p>
2575
          </td>
2576
        </tr>
2577
        <tr>
2578
          <td>
2579
            <p>-//W3C//ELEMENTS XHTML Block Presentation 1.0//EN</p>
2580
          </td>
2581
          <td>
2582
            <p>http://www.w3.org/MarkUp/DTD/xhtml-blkpres-1.mod</p>
2583
          </td>
2584
          <td>
2585
            <p>w3c/xhtml11/xhtml-blkpres-1.mod</p>
2586
          </td>
2587
        </tr>
2588
        <tr>
2589
          <td>
2590
            <p>-//W3C//ELEMENTS XHTML Block Structural 1.0//EN</p>
2591
          </td>
2592
          <td>
2593
            <p>http://www.w3.org/MarkUp/DTD/xhtml-blkstruct-1.mod</p>
2594
          </td>
2595
          <td>
2596
            <p>w3c/xhtml11/xhtml-blkstruct-1.mod</p>
2597
          </td>
2598
        </tr>
2599
        <tr>
2600
          <td>
2601
            <p>-//W3C//ENTITIES XHTML Character Entities 1.0//EN</p>
2602
          </td>
2603
          <td>
2604
            <p>http://www.w3.org/MarkUp/DTD/xhtml-charent-1.mod</p>
2605
          </td>
2606
          <td>
2607
            <p>w3c/xhtml11/xhtml-charent-1.mod</p>
2608
          </td>
2609
        </tr>
2610
        <tr>
2611
          <td>
2612
            <p>-//W3C//ELEMENTS XHTML Client-side Image Maps 1.0//EN</p>
2613
          </td>
2614
          <td>
2615
            <p>http://www.w3.org/MarkUp/DTD/xhtml-csismap-1.mod</p>
2616
          </td>
2617
          <td>
2618
            <p>w3c/xhtml11/xhtml-csismap-1.mod</p>
2619
          </td>
2620
        </tr>
2621
        <tr>
2622
          <td>
2623
            <p>-//W3C//ENTITIES XHTML Datatypes 1.0//EN</p>
2624
          </td>
2625
          <td>
2626
            <p>http://www.w3.org/MarkUp/DTD/xhtml-datatypes-1.mod</p>
2627
          </td>
2628
          <td>
2629
            <p>w3c/xhtml11/xhtml-datatypes-1.mod</p>
2630
          </td>
2631
        </tr>
2632
        <tr>
2633
          <td>
2634
            <p>-//W3C//ELEMENTS XHTML Editing Markup 1.0//EN</p>
2635
          </td>
2636
          <td>
2637
            <p>http://www.w3.org/MarkUp/DTD/xhtml-edit-1.mod</p>
2638
          </td>
2639
          <td>
2640
            <p>w3c/xhtml11/xhtml-edit-1.mod</p>
2641
          </td>
2642
        </tr>
2643
        <tr>
2644
          <td>
2645
            <p>-//W3C//ENTITIES XHTML Intrinsic Events 1.0//EN</p>
2646
          </td>
2647
          <td>
2648
            <p>http://www.w3.org/MarkUp/DTD/xhtml-events-1.mod</p>
2649
          </td>
2650
          <td>
2651
            <p>w3c/xhtml11/xhtml-events-1.mod</p>
2652
          </td>
2653
        </tr>
2654
        <tr>
2655
          <td>
2656
            <p>-//W3C//ELEMENTS XHTML Forms 1.0//EN</p>
2657
          </td>
2658
          <td>
2659
            <p>http://www.w3.org/MarkUp/DTD/xhtml-form-1.mod</p>
2660
          </td>
2661
          <td>
2662
            <p>w3c/xhtml11/xhtml-form-1.mod</p>
2663
          </td>
2664
        </tr>
2665
        <tr>
2666
          <td>
2667
            <p>-//W3C//ELEMENTS XHTML Frames 1.0//EN</p>
2668
          </td>
2669
          <td>
2670
            <p>http://www.w3.org/MarkUp/DTD/xhtml-frames-1.mod</p>
2671
          </td>
2672
          <td>
2673
            <p>w3c/xhtml11/xhtml-frames-1.mod</p>
2674
          </td>
2675
        </tr>
2676
        <tr>
2677
          <td>
2678
            <p>-//W3C//ENTITIES XHTML Modular Framework 1.0//EN</p>
2679
          </td>
2680
          <td>
2681
            <p>http://www.w3.org/MarkUp/DTD/xhtml-framework-1.mod</p>
2682
          </td>
2683
          <td>
2684
            <p>w3c/xhtml11/xhtml-framework-1.mod</p>
2685
          </td>
2686
        </tr>
2687
        <tr>
2688
          <td>
2689
            <p>-//W3C//ENTITIES XHTML HyperAttributes 1.0//EN</p>
2690
          </td>
2691
          <td>
2692
            <p>http://www.w3.org/MarkUp/DTD/xhtml-hyperAttributes-1.mod</p>
2693
          </td>
2694
          <td>
2695
            <p>w3c/xhtml11/xhtml-hyperAttributes-1.mod</p>
2696
          </td>
2697
        </tr>
2698
        <tr>
2699
          <td>
2700
            <p>-//W3C//ELEMENTS XHTML Hypertext 1.0//EN</p>
2701
          </td>
2702
          <td>
2703
            <p>http://www.w3.org/MarkUp/DTD/xhtml-hypertext-1.mod</p>
2704
          </td>
2705
          <td>
2706
            <p>w3c/xhtml11/xhtml-hypertext-1.mod</p>
2707
          </td>
2708
        </tr>
2709
        <tr>
2710
          <td>
2711
            <p>-//W3C//ELEMENTS XHTML Inline Frame Element 1.0//EN</p>
2712
          </td>
2713
          <td>
2714
            <p>http://www.w3.org/MarkUp/DTD/xhtml-iframe-1.mod</p>
2715
          </td>
2716
          <td>
2717
            <p>w3c/xhtml11/xhtml-iframe-1.mod</p>
2718
          </td>
2719
        </tr>
2720
        <tr>
2721
          <td>
2722
            <p>-//W3C//ELEMENTS XHTML Images 1.0//EN</p>
2723
          </td>
2724
          <td>
2725
            <p>http://www.w3.org/MarkUp/DTD/xhtml-image-1.mod</p>
2726
          </td>
2727
          <td>
2728
            <p>w3c/xhtml11/xhtml-image-1.mod</p>
2729
          </td>
2730
        </tr>
2731
        <tr>
2732
          <td>
2733
            <p>-//W3C//ELEMENTS XHTML Inline Phrasal 1.0//EN</p>
2734
          </td>
2735
          <td>
2736
            <p>http://www.w3.org/MarkUp/DTD/xhtml-inlphras-1.mod</p>
2737
          </td>
2738
          <td>
2739
            <p>w3c/xhtml11/xhtml-inlphras-1.mod</p>
2740
          </td>
2741
        </tr>
2742
        <tr>
2743
          <td>
2744
            <p>-//W3C//ELEMENTS XHTML Inline Presentation 1.0//EN</p>
2745
          </td>
2746
          <td>
2747
            <p>http://www.w3.org/MarkUp/DTD/xhtml-inlpres-1.mod</p>
2748
          </td>
2749
          <td>
2750
            <p>xhtml11/xhtml-inlpres-1.mod</p>
2751
          </td>
2752
        </tr>
2753
        <tr>
2754
          <td>
2755
            <p>-//W3C//ELEMENTS XHTML Inline Structural 1.0//EN</p>
2756
          </td>
2757
          <td>
2758
            <p>http://www.w3.org/MarkUp/DTD/xhtml-inlstruct-1.mod</p>
2759
          </td>
2760
          <td>
2761
            <p>w3c/xhtml11/xhtml-inlstruct-1.mod</p>
2762
          </td>
2763
        </tr>
2764
        <tr>
2765
          <td>
2766
            <p>-//W3C//ENTITIES XHTML Inline Style 1.0//EN</p>
2767
          </td>
2768
          <td>
2769
            <p>http://www.w3.org/MarkUp/DTD/xhtml-inlstyle-1.mod</p>
2770
          </td>
2771
          <td>
2772
            <p>w3c/xhtml11/xhtml-inlstyle-1.mod</p>
2773
          </td>
2774
        </tr>
2775
        <tr>
2776
          <td>
2777
            <p>-//W3C//ELEMENTS XHTML Inputmode 1.0//EN</p>
2778
          </td>
2779
          <td>
2780
            <p>http://www.w3.org/MarkUp/DTD/xhtml-inputmode-1.mod</p>
2781
          </td>
2782
          <td>
2783
            <p>w3c/xhtml11/xhtml-inputmode-1.mod</p>
2784
          </td>
2785
        </tr>
2786
        <tr>
2787
          <td>
2788
            <p>-//W3C//ELEMENTS XHTML Legacy Markup 1.0//EN</p>
2789
          </td>
2790
          <td>
2791
            <p>http://www.w3.org/MarkUp/DTD/xhtml-legacy-1.mod</p>
2792
          </td>
2793
          <td>
2794
            <p>w3c/xhtml11/xhtml-legacy-1.mod</p>
2795
          </td>
2796
        </tr>
2797
        <tr>
2798
          <td>
2799
            <p>-//W3C//ELEMENTS XHTML Legacy Redeclarations 1.0//EN</p>
2800
          </td>
2801
          <td>
2802
            <p>http://www.w3.org/MarkUp/DTD/xhtml-legacy-redecl-1.mod</p>
2803
          </td>
2804
          <td>
2805
            <p>w3c/xhtml11/xhtml-legacy-redecl-1.mod</p>
2806
          </td>
2807
        </tr>
2808
        <tr>
2809
          <td>
2810
            <p>-//W3C//ELEMENTS XHTML Link Element 1.0//EN</p>
2811
          </td>
2812
          <td>
2813
            <p>http://www.w3.org/MarkUp/DTD/xhtml-link-1.mod</p>
2814
          </td>
2815
          <td>
2816
            <p>w3c/xhtml11/xhtml-link-1.mod</p>
2817
          </td>
2818
        </tr>
2819
        <tr>
2820
          <td>
2821
            <p>-//W3C//ELEMENTS XHTML Lists 1.0//EN</p>
2822
          </td>
2823
          <td>
2824
            <p>http://www.w3.org/MarkUp/DTD/xhtml-list-1.mod</p>
2825
          </td>
2826
          <td>
2827
            <p>w3c/xhtml11/xhtml-list-1.mod</p>
2828
          </td>
2829
        </tr>
2830
        <tr>
2831
          <td>
2832
            <p>-//W3C//ELEMENTS XHTML Metainformation 1.0//EN</p>
2833
          </td>
2834
          <td>
2835
            <p>http://www.w3.org/MarkUp/DTD/xhtml-meta-1.mod</p>
2836
          </td>
2837
          <td>
2838
            <p>w3c/xhtml11/xhtml-meta-1.mod</p>
2839
          </td>
2840
        </tr>
2841
        <tr>
2842
          <td>
2843
            <p>-//W3C//ELEMENTS XHTML Metainformation 2.0//EN</p>
2844
          </td>
2845
          <td>
2846
            <p>http://www.w3.org/MarkUp/DTD/xhtml-meta-2.mod</p>
2847
          </td>
2848
          <td>
2849
            <p>w3c/xhtml11/xhtml-meta-2.mod</p>
2850
          </td>
2851
        </tr>
2852
        <tr>
2853
          <td>
2854
            <p>-//W3C//ENTITIES XHTML MetaAttributes 1.0//EN</p>
2855
          </td>
2856
          <td>
2857
            <p>http://www.w3.org/MarkUp/DTD/xhtml-metaAttributes-1.mod</p>
2858
          </td>
2859
          <td>
2860
            <p>w3c/xhtml11/xhtml-metaAttributes-1.mod</p>
2861
          </td>
2862
        </tr>
2863
        <tr>
2864
          <td>
2865
            <p>-//W3C//ELEMENTS XHTML Name Identifier 1.0//EN</p>
2866
          </td>
2867
          <td>
2868
            <p>http://www.w3.org/MarkUp/DTD/xhtml-nameident-1.mod</p>
2869
          </td>
2870
          <td>
2871
            <p>w3c/xhtml11/xhtml-nameident-1.mod</p>
2872
          </td>
2873
        </tr>
2874
        <tr>
2875
          <td>
2876
            <p>-//W3C//NOTATIONS XHTML Notations 1.0//EN</p>
2877
          </td>
2878
          <td>
2879
            <p>http://www.w3.org/MarkUp/DTD/xhtml-notations-1.mod</p>
2880
          </td>
2881
          <td>
2882
            <p>w3c/xhtml11/xhtml-notations-1.mod</p>
2883
          </td>
2884
        </tr>
2885
        <tr>
2886
          <td>
2887
            <p>-//W3C//ELEMENTS XHTML Embedded Object 1.0//EN</p>
2888
          </td>
2889
          <td>
2890
            <p>http://www.w3.org/MarkUp/DTD/xhtml-object-1.mod</p>
2891
          </td>
2892
          <td>
2893
            <p>w3c/xhtml11/xhtml-object-1.mod</p>
2894
          </td>
2895
        </tr>
2896
        <tr>
2897
          <td>
2898
            <p>-//W3C//ELEMENTS XHTML Param Element 1.0//EN</p>
2899
          </td>
2900
          <td>
2901
            <p>http://www.w3.org/MarkUp/DTD/xhtml-param-1.mod</p>
2902
          </td>
2903
          <td>
2904
            <p>w3c/xhtml11/xhtml-param-1.mod</p>
2905
          </td>
2906
        </tr>
2907
        <tr>
2908
          <td>
2909
            <p>-//W3C//ELEMENTS XHTML Presentation 1.0//EN</p>
2910
          </td>
2911
          <td>
2912
            <p>http://www.w3.org/MarkUp/DTD/xhtml-pres-1.mod</p>
2913
          </td>
2914
          <td>
2915
            <p>w3c/xhtml11/xhtml-pres-1.mod</p>
2916
          </td>
2917
        </tr>
2918
        <tr>
2919
          <td>
2920
            <p>-//W3C//ENTITIES XHTML-Print 1.0 Document Model 1.0//EN</p>
2921
          </td>
2922
          <td>
2923
            <p>http://www.w3.org/MarkUp/DTD/xhtml-print10-model-1.mod</p>
2924
          </td>
2925
          <td>
2926
            <p>w3c/xhtml11/xhtml-print10-model-1.mod</p>
2927
          </td>
2928
        </tr>
2929
        <tr>
2930
          <td>
2931
            <p>-//W3C//ENTITIES XHTML Qualified Names 1.0//EN</p>
2932
          </td>
2933
          <td>
2934
            <p>http://www.w3.org/MarkUp/DTD/xhtml-qname-1.mod</p>
2935
          </td>
2936
          <td>
2937
            <p>w3c/xhtml11/xhtml-qname-1.mod</p>
2938
          </td>
2939
        </tr>
2940
        <tr>
2941
          <td>
2942
            <p>-//W3C//ENTITIES XHTML+RDFa Document Model 1.0//EN</p>
2943
          </td>
2944
          <td>
2945
            <p>http://www.w3.org/MarkUp/DTD/xhtml-rdfa-model-1.mod</p>
2946
          </td>
2947
          <td>
2948
            <p>w3c/xhtml11/xhtml-rdfa-model-1.mod</p>
2949
          </td>
2950
        </tr>
2951
        <tr>
2952
          <td>
2953
            <p>-//W3C//ENTITIES XHTML RDFa Attribute Qnames 1.0//EN</p>
2954
          </td>
2955
          <td>
2956
            <p>http://www.w3.org/MarkUp/DTD/xhtml-rdfa-qname-1.mod</p>
2957
          </td>
2958
          <td>
2959
            <p>w3c/xhtml11/xhtml-rdfa-qname-1.mod</p>
2960
          </td>
2961
        </tr>
2962
        <tr>
2963
          <td>
2964
            <p>-//W3C//ENTITIES XHTML Role Attribute 1.0//EN</p>
2965
          </td>
2966
          <td>
2967
            <p>http://www.w3.org/MarkUp/DTD/xhtml-role-1.mod</p>
2968
          </td>
2969
          <td>
2970
            <p>w3c/xhtml11/xhtml-role-1.mod</p>
2971
          </td>
2972
        </tr>
2973
        <tr>
2974
          <td>
2975
            <p>-//W3C//ENTITIES XHTML Role Attribute Qnames 1.0//EN</p>
2976
          </td>
2977
          <td>
2978
            <p>http://www.w3.org/MarkUp/DTD/xhtml-role-qname-1.mod</p>
2979
          </td>
2980
          <td>
2981
            <p>w3c/xhtml11/xhtml-role-qname-1.mod</p>
2982
          </td>
2983
        </tr>
2984
        <tr>
2985
          <td>
2986
            <p>-//W3C//ELEMENTS XHTML Ruby 1.0//EN</p>
2987
          </td>
2988
          <td>
2989
            <p>http://www.w3.org/TR/ruby/xhtml-ruby-1.mod</p>
2990
          </td>
2991
          <td>
2992
            <p>w3c/xhtml11/xhtml-ruby-1.mod</p>
2993
          </td>
2994
        </tr>
2995
        <tr>
2996
          <td>
2997
            <p>-//W3C//ELEMENTS XHTML Scripting 1.0//EN</p>
2998
          </td>
2999
          <td>
3000
            <p>http://www.w3.org/MarkUp/DTD/xhtml-script-1.mod</p>
3001
          </td>
3002
          <td>
3003
            <p>w3c/xhtml11/xhtml-script-1.mod</p>
3004
          </td>
3005
        </tr>
3006
        <tr>
3007
          <td>
3008
            <p>-//W3C//ELEMENTS XHTML Server-side Image Maps 1.0//EN</p>
3009
          </td>
3010
          <td>
3011
            <p>http://www.w3.org/MarkUp/DTD/xhtml-ssismap-1.mod</p>
3012
          </td>
3013
          <td>
3014
            <p>w3c/xhtml11/xhtml-ssismap-1.mod</p>
3015
          </td>
3016
        </tr>
3017
        <tr>
3018
          <td>
3019
            <p>-//W3C//ELEMENTS XHTML Document Structure 1.0//EN</p>
3020
          </td>
3021
          <td>
3022
            <p>http://www.w3.org/MarkUp/DTD/xhtml-struct-1.mod</p>
3023
          </td>
3024
          <td>
3025
            <p>w3c/xhtml11/xhtml-struct-1.mod</p>
3026
          </td>
3027
        </tr>
3028
        <tr>
3029
          <td>
3030
            <p>-//W3C//DTD XHTML Style Sheets 1.0//EN</p>
3031
          </td>
3032
          <td>
3033
            <p>http://www.w3.org/MarkUp/DTD/xhtml-style-1.mod</p>
3034
          </td>
3035
          <td>
3036
            <p>w3c/xhtml11/xhtml-style-1.mod</p>
3037
          </td>
3038
        </tr>
3039
        <tr>
3040
          <td>
3041
            <p>-//W3C//ELEMENTS XHTML Tables 1.0//EN</p>
3042
          </td>
3043
          <td>
3044
            <p>http://www.w3.org/MarkUp/DTD/xhtml-table-1.mod</p>
3045
          </td>
3046
          <td>
3047
            <p>w3c/xhtml11/xhtml-table-1.mod</p>
3048
          </td>
3049
        </tr>
3050
        <tr>
3051
          <td>
3052
            <p>-//W3C//ELEMENTS XHTML Target 1.0//EN</p>
3053
          </td>
3054
          <td>
3055
            <p>http://www.w3.org/MarkUp/DTD/xhtml-target-1.mod</p>
3056
          </td>
3057
          <td>
3058
            <p>w3c/xhtml11/xhtml-target-1.mod</p>
3059
          </td>
3060
        </tr>
3061
        <tr>
3062
          <td>
3063
            <p>-//W3C//ELEMENTS XHTML Text 1.0//EN</p>
3064
          </td>
3065
          <td>
3066
            <p>http://www.w3.org/MarkUp/DTD/xhtml-text-1.mod</p>
3067
          </td>
3068
          <td>
3069
            <p>w3c/xhtml11/xhtml-text-1.mod</p>
3070
          </td>
3071
        </tr>
3072
        <tr>
3073
          <td>
3074
            <p>-//W3C//ENTITIES XHTML 1.1 Document Model 1.0//EN</p>
3075
          </td>
3076
          <td>
3077
            <p>http://www.w3.org/MarkUp/DTD/xhtml11-model-1.mod</p>
3078
          </td>
3079
          <td>
3080
            <p>w3c/xhtml11/xhtml11-model-1.mod</p>
3081
          </td>
3082
        </tr>
3083
        <tr>
3084
          <td>
3085
            <p>-//W3C//MathML 1.0//EN</p>
3086
          </td>
3087
          <td>
3088
            <p>http://www.w3.org/Math/DTD/mathml1/mathml.dtd</p>
3089
          </td>
3090
          <td>
3091
            <p>w3c/mathml/mathml1/mathml.dtd</p>
3092
          </td>
3093
        </tr>
3094
        <tr>
3095
          <td>
3096
            <p>-//W3C//DTD MathML 2.0//EN</p>
3097
          </td>
3098
          <td>
3099
            <p>http://www.w3.org/Math/DTD/mathml2/mathml2.dtd</p>
3100
          </td>
3101
          <td>
3102
            <p>w3c/mathml/mathml2/mathml2.dtd</p>
3103
          </td>
3104
        </tr>
3105
        <tr>
3106
          <td>
3107
            <p>-//W3C//DTD MathML 3.0//EN</p>
3108
          </td>
3109
          <td>
3110
            <p>http://www.w3.org/Math/DTD/mathml3/mathml3.dtd</p>
3111
          </td>
3112
          <td>
3113
            <p>w3c/mathml/mathml3/mathml3.dtd</p>
3114
          </td>
3115
        </tr>
3116
        <tr>
3117
          <td>
3118
            <p>-//W3C//DTD SVG 1.0//EN</p>
3119
          </td>
3120
          <td>
3121
            <p>http://www.w3.org/TR/2001/REC-SVG-20010904/DTD/svg10.dtd</p>
3122
          </td>
3123
          <td>
3124
            <p>w3c/svg10/svg10.dtd</p>
3125
          </td>
3126
        </tr>
3127
        <tr>
3128
          <td>
3129
            <p>-//W3C//DTD SVG 1.1//EN</p>
3130
          </td>
3131
          <td>
3132
            <p>http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd</p>
3133
          </td>
3134
          <td>
3135
            <p>w3c/svg11/svg11.dtd</p>
3136
          </td>
3137
        </tr>
3138
        <tr>
3139
          <td>
3140
            <p>-//W3C//DTD SVG 1.1 Tiny//EN</p>
3141
          </td>
3142
          <td>
3143
            <p>http://www.w3.org/Graphics/SVG/1.1/DTD/svg11-tiny.dtd</p>
3144
          </td>
3145
          <td>
3146
            <p>w3c/svg11/svg11-tiny.dtd</p>
3147
          </td>
3148
        </tr>
3149
        <tr>
3150
          <td>
3151
            <p>-//W3C//DTD SVG 1.1 Basic//EN</p>
3152
          </td>
3153
          <td>
3154
            <p>http://www.w3.org/Graphics/SVG/1.1/DTD/svg11-basic.dtd</p>
3155
          </td>
3156
          <td>
3157
            <p>w3c/svg11/svg11-basic.dtd</p>
3158
          </td>
3159
        </tr>
3160
        <tr>
3161
          <td>
3162
            <p>-//XML-DEV//ENTITIES RDDL Document Model 1.0//EN</p>
3163
          </td>
3164
          <td>
3165
            <p>http://www.rddl.org/xhtml-rddl-model-1.mod</p>
3166
          </td>
3167
          <td>
3168
            <p>w3c/rddl/xhtml-rddl-model-1.mod</p>
3169
          </td>
3170
        </tr>
3171
        <tr>
3172
          <td>
3173
            <p>-//XML-DEV//DTD XHTML RDDL 1.0//EN</p>
3174
          </td>
3175
          <td>
3176
            <p>http://www.rddl.org/rddl-xhtml.dtd</p>
3177
          </td>
3178
          <td>
3179
            <p>w3c/rddl/rddl-xhtml.dtd</p>
3180
          </td>
3181
        </tr>
3182
        <tr>
3183
          <td>
3184
            <p>-//XML-DEV//ENTITIES RDDL QName Module 1.0//EN</p>
3185
          </td>
3186
          <td>
3187
            <p>http://www.rddl.org/rddl-qname-1.mod</p>
3188
          </td>
3189
          <td>
3190
            <p>rddl/rddl-qname-1.mod</p>
3191
          </td>
3192
        </tr>
3193
        <tr>
3194
          <td>
3195
            <p>-//XML-DEV//ENTITIES RDDL Resource Module 1.0//EN</p>
3196
          </td>
3197
          <td>
3198
            <p>http://www.rddl.org/rddl-resource-1.mod</p>
3199
          </td>
3200
          <td>
3201
            <p>rddl/rddl-resource-1.mod</p>
3202
          </td>
3203
        </tr>
3204
        <tr>
3205
          <td>
3206
            <p>-//W3C//DTD Specification V2.10//EN</p>
3207
          </td>
3208
          <td>
3209
            <p>http://www.w3.org/2002/xmlspec/dtd/2.10/xmlspec.dtd</p>
3210
          </td>
3211
          <td>
3212
            <p>w3c/xmlspec/xmlspec.dtd</p>
3213
          </td>
3214
        </tr>
3215
        <tr>
3216
          <td>
3217
            <p>-//W3C//DTD XMLSCHEMA 200102//EN</p>
3218
          </td>
3219
          <td>
3220
            <p>http://www.w3.org/2001/XMLSchema.dtd</p>
3221
          </td>
3222
          <td>
3223
            <p>w3c/xmlschema/XMLSchema.dtd</p>
3224
          </td>
3225
        </tr>
3226

    
3227

    
3228
      </tbody>
3229
    </table>
3230

    
3231
    <p>This Saxon feature can be disabled by setting the configuration property <a
3232
        class="bodylink code" href="/configuration/config-features"
3233
        >Feature.ENTITY_RESOLVER_CLASS</a> to null; it is also possible to set it to a different
3234
        <code>EntityResolver</code> class (perhaps a subclass of Saxon's
3235
        <code>StandardEntityResolver</code>) that varies the behavior. If an
3236
        <code>EntityResolver</code> is set in the relevant <code>ParseOptions</code> or in an
3237
        <code>AugmentedSource</code> then this will override any <code>EntityResolver</code> set at
3238
      the configuration level.</p>
3239
  </section>
3240

    
3241
</article>
(16-16/19)