Project

Profile

Help

How to connect?
Download (157 KB) Statistics
| Branch: | Revision:

he / src / userdoc / sourcedocs.xml @ 8ddaa514

1
<?xml version="1.0" encoding="utf-8"?>
2
<article id="sourcedocs" title="Handling XML Documents">
3
  <h1>Handling XML Documents</h1>
4

    
5

    
6
  <p>This section discusses the various options in Saxon for handling XML documents.
7
    These might form the input or output of a query or stylesheet, or they might be
8
  used directly by application code written (say) in Java.</p>
9

    
10
  <p>See the topics below for further information:</p>
11

    
12
  <nav>
13
    <ul/>
14
  </nav>
15

    
16
  <section id="command-line" title="Source Documents on the Command Line">
17
    <h1>Source Documents on the Command Line</h1>
18

    
19

    
20
    <p>When Saxon (either XSLT or XQuery) is invoked from the command line, the source document will
21
      normally be an XML 1.0 document. Supplying an XML 1.1 document will also work, provided that
22
      (a) the selected parser is an XML 1.1 parser, and (b) the command line option
23
        <code>-xmlversion:1.1</code> is set.</p>
24

    
25
    <p>If a custom parser is specified using the <code>-x</code> option on the command line, then
26
      the source document can be in any format accepted by this custom parser. The only constraint
27
      is that the parser must behave as a SAX2 parser, delivering a stream of events that define a
28
      virtual XML document. For example, the TagSoup parser from John Cowan can be used to feed an
29
      HTML document as input to Saxon.</p>
30

    
31
    <p>Non-standard input formats can also be handled by specifying a user-written
32
        <code>URIResolver</code>. If the <code>-u</code> option is used on the command line, or if
33
      the source file name begins with <code>http:</code> or <code>https:</code> or
34
        <code>file:</code> or <code>classpath:</code>, then the source file name is resolved to a
35
      JAXP Source object using the <code>URIResolver</code>; if a user-written
36
        <code>URIResolver</code> is nominated (using the <code>-r</code> option) then this may
37
      translate the file name into a <code>Source</code> object any way that it wishes.</p>
38

    
39
    <aside>Saxon (from 9.7) supports the <code>classpath</code> URI scheme to locate resources
40
      using the Java classpath. This URI scheme is defined by the Spring framework, but Saxon's
41
      implementation is free-standing. For example, <code>classpath:utility.xsl</code> will locate
42
      a file called <code>utility.xsl</code> as a resource on the classpath.</aside>
43
    <aside>Saxon (from 9.9) also supports the <code>data</code> URI scheme, which allows
44
      a small resource to be contained within the URI itself, suitably encoded.</aside>
45
  </section>
46
  <section id="collections" title="Collections">
47
    <h1>Collections</h1>
48

    
49
    <p>Saxon implements the <a class="bodylink code" href="/functions/fn/collection"
50
        >collection()</a> and <a class="bodylink code" href="/functions/fn/uri-collection"
51
        >uri-collection()</a> functions by passing the given collection URI (or null, if the default
52
      collection is requested) to a user-provided <a class="javalink"
53
        href="net.sf.saxon.lib.CollectionFinder">CollectionFinder</a>. This section describes how
54
      the standard (default) collection finder behaves, if no user-written collection finder is
55
      supplied. (For information on supplying a user-written <code>CollectionFinder</code>, see <a
56
        class="bodylink" href="user-collections">Writing your own Collection Finder</a>.)</p>
57
    
58
    <p>In XSLT 3.0 and XQuery 3.1, collections can contain resources other than XML documents: for
59
    example, JSON documents, plain text documents, and binary files.</p>
60

    
61
    <p>The default collection can be registered with the <code>Configuration</code> in the form of a
62
      collection URI. When the <code>collection()</code> function is called with no arguments, this
63
      is exactly the same as supplying this default collection URI. If no default collection URI has
64
      been registered, an empty collection is returned.</p>
65

    
66
    <p>The standard collection finder supports four different kinds of collection: registered collections,
67
      catalog-based collections, directory-based collections, and zip-based collections:</p>
68
    
69
    <ul>
70
      <li><p>A registered collection is one that has been explicitly registered with the Configuration, by calling
71
      <code>Configuration.registerCollection()</code>.</p></li>
72
      <li><p>If the collection URI
73
        corresponds to a directory name, then a directory-based collection is used: the collection contains
74
      selected files from the named directory.</p></li>
75
      <li><p>If the collection URI identifies a
76
        ZIP or JAR file (more specifically, if it uses the <code>jar</code> URI scheme, or has a file extension of
77
        ".zip" or ".jar") then a zip-based collection is used.</p></li>
78
      <li><p>Otherwise, the collection URI must be
79
        the URI of an XML file which acts as a catalog, that is, it contains a list of the resources
80
        in the collection.</p></li>
81
    </ul>
82

    
83

    
84
    <aside>
85
      <p>To recognize additional kinds of ZIP file, for example Open Office documents, write a
86
        subclass of <code>StandardCollectionFinder</code> that overrides the method
87
          <code>isJarFileURI()</code>.</p>
88
    </aside>
89

    
90

    
91
    <p>Saxon by default recognizes four kids of resource: XML documents,
92
      JSON documents, unparsed text documents, and binary files. The standard collection resolver
93
      attempts to identify which kind of resource to use based on the content type (media type),
94
      which in turn may be inferred from HTTP headers, from sniffing the initial bytes of the
95
      content, or from file extensions.</p>
96

    
97
    <p>In the case of directory-based and ZIP-based collections, query parameters may be added to
98
      the collection URI to further control how it is to be processed.</p>
99
    
100
    <aside><p>Saxon cannot assume that the nodes returned by the <code>collection()</code> function
101
    are in document order. It is therefore best to avoid expressions like <code>collection()/doc/section</code>
102
    which force the collection to be sorted (and therefore force all the nodes in the collection to
103
    be in memory at the same time). To iterate over a collection, it's better to use constructs that
104
    don't sort into document order: for example <code>collection() ! doc/section</code>,
105
    or <code>xsl:for-each</code>, or <code>for $x in collection() return ...</code>.</p>
106
    
107
      <p>See also <a class="bodylink code"
108
        href="/functions/saxon/discard-document">saxon:discard-document()</a>.</p></aside>
109

    
110
    <h2 class="subtitle">Defining a collection using a catalog file</h2>
111

    
112
    <p>If the collection URI identifies a file, Saxon treats this as a catalog file. This is a file
113
      in XML format that lists the documents comprising the collection. Here is an example of such a
114
      catalog file:</p>
115
    <samp><![CDATA[<collection stable="true">
116
  <doc href="dir/chap1.xml"/>
117
  <doc href="dir/chap2.xml"/>
118
  <doc href="dir/chap3.xml"/>
119
  <doc href="dir/chap4.xml"/>
120
</collection>]]></samp>
121

    
122
    <p>The <code>stable</code> attribute indicates whether the collection is stable or not. The
123
      default value is <code>true</code>. If a collection is stable, then the URIs listed in the
124
        <code>doc</code> elements are treated like URIs passed to the <code>doc()</code> function.
125
      Each URI is first looked up in the document pool to see if it is already loaded; if it is,
126
      then the document node is returned. Otherwise the URI is passed to the registered
127
        <code>URIResolver</code>, and the resulting document is added to the document pool. The
128
      effect of this process is firstly, that two calls on the <code>collection()</code> function
129
      passing the same collection URI will return the same nodes each time, and secondly, that these
130
      results are consistent with the results of the <code>doc()</code> function: if the
131
        <code>document-uri()</code> of a node returned by the <code>collection()</code> function is
132
      passed to the <code>doc()</code> function, the original node will be returned. If
133
        <code>stable="false"</code> is specified, however, the URI is dereferenced directly, and the
134
      document is not added to the document pool, which means that a subsequent retrieval of the
135
      same document will not return the same node.</p>
136

    
137
    <h2 class="subtitle">Processing directories</h2>
138

    
139
    <p>If the URI passed to the <code>collection()</code> function (still assuming a default
140
        <code>CollectionFinder</code>) identifies a directory, then the contents of the
141
      directory are returned. Such a URI may have a number of query parameters, written in the form
142
        <code>file:///a/b/c/d?keyword=value;keyword=value;...</code>. The recognized keywords and
143
      their values are as follows:</p>
144
    <table>
145
      <thead class="params">
146
        <tr>
147
          <td>
148
            <p> keyword </p>
149
          </td>
150
          <td>
151
            <p> values </p>
152
          </td>
153
          <td>
154
            <p> effect </p>
155
          </td>
156
        </tr>
157
      </thead>
158
      <tbody>
159
        <tr>
160
          <td class="keyword">
161
            <p> recurse </p>
162
          </td>
163
          <td>
164
            <p>
165
              <span class="value">yes | no</span> (default <span class="value">no</span>) </p>
166
          </td>
167
          <td>
168
            <p> Determines whether subdirectories are searched recursively. </p>
169
          </td>
170
        </tr>
171
        <tr>
172
          <td class="keyword">
173
            <p> strip-space </p>
174
          </td>
175
          <td>
176
            <p class="value"> yes | ignorable | no </p>
177
          </td>
178
          <td>
179
            <p> Determines whether whitespace text nodes are to be stripped. The default depends on
180
              the <a class="javalink" href="net.sf.saxon.Configuration">Configuration</a> settings.
181
            </p>
182
          </td>
183
        </tr>
184
        <tr>
185
          <td class="keyword">
186
            <p> validation </p>
187
          </td>
188
          <td>
189
            <p class="value"> strip | preserve | lax | strict </p>
190
          </td>
191
          <td>
192
            <p> Determines whether and how schema validation is applied to each document. The
193
              default depends on the <a class="javalink" href="net.sf.saxon.Configuration"
194
                >Configuration</a> settings. </p>
195
          </td>
196
        </tr>
197
        <tr>
198
          <td class="keyword">
199
            <p> select </p>
200
          </td>
201
          <td>
202
            <p> file name pattern ("glob")</p>
203
          </td>
204
          <td>
205
            <p> Determines which files are selected (see below). </p>
206
          </td>
207
        </tr>
208
        <tr>
209
          <td class="keyword">
210
            <p> match </p>
211
          </td>
212
          <td>
213
            <p> regular expression</p>
214
          </td>
215
          <td>
216
            <p> Determines which files are selected (see below). </p>
217
          </td>
218
        </tr>
219
        <!--<tr>
220
          <td class="keyword">
221
            <p> content-type </p>
222
          </td>
223
          <td>
224
            <p> media type (for example <code>application/xml</code> or <code>text/plain</code>)</p>
225
          </td>
226
          <td>
227
            <p> Determines how the resource is processed. For example if the media type is 
228
            <code>application/xml</code> then it will be parsed as XML and returned as a document node;
229
            if it is <code>text/plain</code> then it is returned as an atomic value of type
230
            <code>xs:string</code>; if it is <code>application/binary</code> then it is returned
231
            as an atomic value of type <code>xs:base64Binary</code>.</p>
232
            <p>If this parameter is absent, then the <code
233
              java="net.sf.saxon.lib.CollectionFinder">CollectionFinder</code> attempts to discern the
234
            content type first by looking at the file extension, and then, if necessary, by
235
            examining the initial bytes of the content itself.</p>
236
            <p>The set of content types that are recognized, and their mapping to implementations of the
237
            class <code java="net.sf.saxon.lib.ResourceFactory">ResourceFactory</code>, is defined in the 
238
            <code java="net.sf.saxon.Configuration">Configuration</code>, and can be changed using the
239
            method <code>Configuration.registerMediaType()</code>. The set of file extensions that are
240
              recognized, and their mapping to media types, is also held in the <code>Configuration</code>, and can be changed using the
241
              method <code>Configuration.registerFileExtension()</code>.</p>
242
          </td>
243
        </tr>-->
244
        <tr>
245
          <td class="keyword">
246
            <p> metadata </p>
247
          </td>
248
          <td>
249
            <p class="value"> yes | no</p>
250
          </td>
251
          <td>
252
            <p> If set to yes, the item returned by the <code>collection()</code> function will be a
253
              map containing properties of the selected resource as well as its content. The keys of
254
              the map will be strings. Two entries with names "name" and "fetch" will always be
255
              available.</p>
256
            <p>The value of the "fetch" entry is a function that can be called to retrieve the
257
              content (it returns the same item that would have been returned with the default
258
              setting of <code>metadata=no</code>: for example a node representing an XML document,
259
              or a map representing the content of a JSON file). This allows you to decide which
260
              items in the collection to fetch based on their properties, for example:</p>
261

    
262
            <p>
263
              <code>for $m in collection('/data/folder?metadata=yes') return if
264
                ($m?content-type='application/xml') then $m?fetch() else ()</code>
265
            </p>
266

    
267
            <p>Failures in parsing a resource can be trapped by using try/catch around the call on
268
              the <code>fetch</code> function.</p>
269
            <p>Other entries in the returned map represent properties of the file obtained from the
270
              operating system: for example <code>last-modified</code>, <code>can-execute</code>,
271
                <code>length</code>, or <code>is-hidden</code>.</p>
272
          </td>
273
        </tr>
274
        <tr>
275
          <td class="keyword">
276
            <p> on-error </p>
277
          </td>
278
          <td>
279
            <p class="value"> fail | warning | ignore </p>
280
          </td>
281
          <td>
282
            <p> Determines the action to be taken if one of the files cannot be successfully parsed.
283
            </p>
284
          </td>
285
        </tr>
286
        <tr>
287
          <td class="keyword">
288
            <p> parser </p>
289
          </td>
290
          <td>
291
            <p> Java class name </p>
292
          </td>
293
          <td>
294
            <p> Class name of the Java <code>XMLReader</code> to be used. For example, John Cowan's
295
                <code>TagSoup</code> parser may be selected by specifying
296
                <code>parser=org.ccil.cowan.tagsoup.Parser</code> (this parses arbitrary ill-formed
297
              HTML and presents it to Saxon as well-formed XML). </p>
298
          </td>
299
        </tr>
300
        <tr>
301
          <td class="keyword">
302
            <p> xinclude </p>
303
          </td>
304
          <td>
305
            <p class="value"> yes | no </p>
306
          </td>
307
          <td>
308
            <p> Determines whether XInclude processing should be applied to the selected documents.
309
              This overrides any setting in the <a class="javalink"
310
                href="net.sf.saxon.Configuration">Configuration</a> (or any command line option).
311
            </p>
312
          </td>
313
        </tr>
314
        <tr>
315
          <td class="keyword">
316
            <p> stable </p>
317
          </td>
318
          <td>
319
            <p class="value"> yes | no </p>
320
          </td>
321
          <td>
322
            <p> Determines whether the collection is to be stable. </p>
323
          </td>
324
        </tr>
325

    
326
      </tbody>
327
    </table>
328

    
329
    <p>The pattern used in the <code>select</code> parameter can use glob-like syntax, for example
330
        <code>*.xml</code> selects all files with extension "xml". More generally, the pattern is
331
      converted to a regular expression by prepending "<code>^</code>", appending "<code>$</code>",
332
      replacing "<code>.</code>" by "<code>\.</code>", "<code>*</code>" by
333
      "<code>.*</code>", and "<code>?</code>" by
334
      "<code>.?</code>", and it is then used to match the file names appearing in the directory
335
      using the Java regular expression rules. So, for example, you can write
336
        <code>?select=*.(xml|xhtml)</code> to match files with either of these two file extensions.
337
      Note however, that special characters used in the URL (that is, characters such as backslash 
338
      and curly braces that are not allowed in the query part of a URI) must be escaped using 
339
      the %HH convention. For example,
340
      vertical bar needs to be written as <code>%7C</code>. This escaping can be achieved using the
341
        <code>encode-for-uri()</code> function.</p>
342
    
343
    <p>As an alternative to the <code>select</code> parameter, the <code>match</code> parameter
344
    can be used. This accepts a standard XPath 3.1 regular expression as its value. For example,
345
    <code>.+\.xml</code> selects all files with extension "xml". Again, characters that are not allowed
346
    in the query part of a URI, such as backslash, curly braces, and vertical bar, must be escaped
347
    using the %HH convention, which can be achieved using the encode-for-uri() function.</p>
348

    
349
    <p> A collection read in this way is not stable by default. (Stability can be expensive, and is
350
      rarely required, so the default setting is recommended.) Making a collection stable has the
351
      effect that the entire result of the <code>collection()</code> function is retained in a cache
352
      for the duration of the query or transformation, and any further calls on
353
        <code>collection()</code> with the same absolute URI return this saved collection retrieved
354
      from this cache. </p>
355

    
356
    <h2 class="subtitle">Processing ZIP and JAR files</h2>
357

    
358
    <p>If the collection URI identifies a ZIP or JAR file then it is processed in exactly the same
359
      way as a directory. URI query parameters can be used in the same way, and have much the same
360
      effect.</p>
361

    
362
    <p>A URI is recognized as a ZIP or JAR file URI if the scheme name is "jar", or if the file
363
      extension is "zip" or "jar".</p>
364

    
365
    <p>The value of the <code>recurse</code> option is ignored in this case, and
366
        <code>recurse=yes</code> is assumed.</p>
367

    
368
    <p>The option <code>metadata=yes</code> is available for ZIP-based collections as well as for
369
      directory-based collections. The set of properties returned in the resulting map is slightly
370
      different, for example it includes any <code>comment</code> field associated with the ZIP file
371
      entry. Note that no items are returned in respect of directory nodes within the ZIP file; only
372
      leaf nodes are represented.</p>
373
    
374
    <h2 class="subtitle">Registered Collections</h2>
375
    
376
    <p>On the .NET product there is another way to use a collection URI (provided that you use the
377
      API rather than the command line): you can register a collection using the
378
      <code>Processor.RegisterCollection</code> method on the <a class="javalink"
379
        href="Saxon.Api.Processor">Saxon.Api.Processor</a> class.</p>
380
    
381
    <section id="user-collections" title="Writing your own Collection Finder">
382
      <h1>Writing your own Collection Finder</h1>
383
      
384
      <p>Since Saxon 9.7, the <a class="javalink" href="net.sf.saxon.lib.CollectionFinder">CollectionFinder</a>
385
        interface replaces the <code>CollectionURIResolver</code> interface in previous
386
        releases. It has much more flexibility, in particular the ability to deliver non-XML
387
        resources. The old <code>CollectionURIResolver</code> interface has been dropped in Saxon 10.</p>
388
      
389
      <p>Details of the interface can be found in the Javadoc. The basic steps are:</p>
390
      
391
      <ol>
392
        <li>
393
          <p>Write a class that implements <code>CollectionFinder</code>. It takes a single method,
394
            which accepts an absolute collection URI, and returns an object that implements
395
            <code>ResourceCollection</code>. Register an instance of your
396
            <code>CollectionFinder</code> with the Saxon <code>Configuration</code>.</p>
397
          <p>For example, a <code>CollectionFinder</code> written to handle collection URIs using the
398
            scheme name "sql" might be supplied as:</p>
399
          <samp><![CDATA[config.setCollectionFinder((context, uri) -> 
400
   uri.startsWith('sql:') 
401
      ? sqlCollection(uri) 
402
      : config.getStandardCollectionFinder().findCollection(context, uri)
403
)]]></samp>
404
          <p>where <code>sqlCollection(uri)</code> returns some user-defined implementation
405
            of <code>ResourceCollection</code>, perhaps one that retrieves XML documents from
406
            a relational database.</p>
407
        </li>
408
        <li>
409
          <p>You can either reuse the existing implementations of <a class="javalink"
410
            href="net.sf.saxon.lib.ResourceCollection">ResourceCollection</a>, namely
411
            <code>CatalogCollection</code>, <code>DirectoryCollection</code>, and
412
            <code>JarCollection</code>, or you can write your own. You can also of course subclass
413
            the existing collection classes. The <code>ResourceCollection</code> object provides two
414
            key methods that you need to implement: <code>getResources()</code>, which returns a
415
            sequence of <code>Resource</code> objects, and <code>getResourceURIs()</code>, which
416
            returns a sequence of URIs. These are invoked by the <a class="bodylink code"
417
              href="/functions/fn/collection" >fn:collection()</a> and <a class="bodylink code"
418
                href="/functions/fn/uri-collection" >fn:uri-collection()</a> functions respectively.</p>
419
        </li>
420
        <li>
421
          <p>Again, you can either reuse existing implementations of <a class="javalink"
422
            href="net.sf.saxon.lib.Resource">Resource</a> (such as <code>XmlResource</code>,
423
            <code>JSONResource</code>, <code>UnparsedTextResource</code>,
424
            <code>BinaryResource</code>, and <code>MetadataResource</code>), or you can create your
425
            own, perhaps by subclassing. The key method that the <code>Resource</code> object must
426
            provide is <code>getItem()</code> which returns the resource in the form of an XDM item.
427
            It is good practice to delay any extensive work such as parsing until the
428
            <code>getItem()</code> method is called: this reduces the memory footprint, and enables
429
            parallel evaluation of multiple threads (Saxon-EE only).</p>
430
        </li>
431
      </ol>
432
    </section>
433

    
434
  </section>
435
  <section id="builder-api" title="Building a Source Document from lexical XML">
436
    <h1>Building a Source Document from lexical XML</h1>
437

    
438
    <p>The conversion of lexical XML to a tree in memory is called <i>parsing</i>, and is performed
439
    by a software component called an <i>XML Parser</i>. Saxon does not include its own XML parser,
440
    rather it provides interfaces that invoke XML parsers supplied by third parties. Platforms
441
    such as Java and .NET typically include a built-in XML parser that Saxon uses by default.</p>
442

    
443
    <p>With the Java s9api interface, a source document can be built using the <a class="javalink"
444
        href="net.sf.saxon.s9api.DocumentBuilder">DocumentBuilder</a> class, which is created using
445
      the factory method <code>newDocumentBuilder</code> on the <a class="javalink"
446
        href="net.sf.saxon.s9api.Processor">Processor</a> object. Various options for document
447
      building are available as methods on the <code>DocumentBuilder</code>, for example options to
448
      perform schema or DTD validation, to strip whitespace, to expand XInclude directives, and also
449
      to choose the tree implementation model to be used.</p>
450
    
451
    <p>These methods create a document from a <code>Source</code> object. This is a JAXP interface designed
452
    as an abstraction of various kinds of XML source, including <code>StreamSource</code>, which represents lexical XML
453
    held in a file or input stream; <code>SAXSource</code>, which represents a source of SAX events; <code>DOMSource</code>,
454
    representing an already-parsed XML document held in a DOM tree; and <code>StAXSource</code>, which represents a
455
      class that responds to requests for STAX (pull-parser) events. In addition, Saxon's <code
456
        java="net.sf.saxon.om.NodeInfo">NodeInfo</code> and <code
457
          java="net.sf.saxon.om.TreeInfo">TreeInfo</code> classes
458
      implements the JAXP <code>Source</code> interface, and the s9api <a class="javalink"
459
        href="net.sf.saxon.s9api.XdmNode">XdmNode</a> class has an <code>asSource()</code> method,
460
      so it is always possible to supply an existing Saxon tree as
461
    the source for any of these interfaces.</p>
462

    
463
    <p>Similarly in the .NET API, there is a <a class="javalink" href="Saxon.Api.DocumentBuilder"
464
        >DocumentBuilder</a> object that can be created from the <a class="javalink"
465
        href="Saxon.Api.Processor">Processor</a>. This allows options to be set controlling the way
466
      documents are built, and provides an overloaded <code>Build</code> method allowing a tree to
467
      be built from various kinds of source.</p>
468

    
469
    <p>It is also possible to build a Saxon tree in memory by using the <code>buildDocumentTree()</code>
470
      method of the <a class="javalink" href="net.sf.saxon.Configuration">Configuration</a> object.
471
      (When using the JAXP Transformation API, the <code>Configuration</code> can be obtained from
472
      the <code>TransformerFactory</code> as the value of the attribute named <a class="javalink"
473
        href="net.sf.saxon.lib.Feature#CONFIGURATION">Feature.CONFIGURATION.name</a>.)</p>
474

    
475
    <p>The <a class="javalink" href="net.sf.saxon.Configuration#buildDocumentTree">buildDocumentTree()</a>
476
      method takes a single argument, a JAXP <code>Source</code>. This can be any of the standard
477
      kinds of JAXP <code>Source</code>. See <a class="bodylink" href="../jaxpsources">JAXP
478
        Sources</a> for more information. The method returns a <code
479
          java="net.sf.saxon.om.TreeInfo">TreeInfo</code> containing information about the constructed tree,
480
      notably the method <code>getRootNode()</code> to get the root node of the tree,
481
      which in most cases will be a document node.
482
    </p>
483

    
484
    <p>All the documents processed in a single transformation or query must be loaded using the same
485
        <a class="javalink" href="net.sf.saxon.Configuration">Configuration</a>. However, it is
486
      possible to copy a document from one <code>Configuration</code> into another by supplying the
487
        <a class="javalink" href="net.sf.saxon.om.TreeInfo">TreeInfo</a> at the root of the
488
      existing document as the <code>Source</code> supplied to the <code>buildDocumentTree()</code>
489
      method of the new <code>Configuration</code>. </p>
490
  </section>
491
  <section id="building-programmatically" title="Building XML Trees Programmatically">
492
    <h1>Building XML Trees Programmatically</h1>
493
    <p>There are various ways in Saxon to build an XDM tree programmatically 
494
      (that is, incrementally one node at a time).</p>
495
    
496
    <h2 class="subtitle">The Sapling Tree API</h2>
497
    <p>A new API offered from Saxon 10 is the Sapling Tree API. This provides a collection of methods to create
498
    nodes; for example, to create a document containing a <code>body</code> element with two paragraphs, the expression</p>
499
    <samp><![CDATA[doc(
500
  elem("body")
501
    .child(elem("p").text("Hello"), 
502
           elem("p").text("World"))
503
      )]]></samp>
504
    <p>might be used. These methods are found in package <code>net.sf.saxon.sapling</code>, specifically in the
505
      class <code java="net.sf.saxon.sapling.Saplings">net.sf.saxon.sapling.Saplings</code>.</p>
506
    <p>The "Sapling" nodes created by these methods are transient nodes used only during tree construction; when the Sapling
507
    tree has been completely built, it can be converted to a regular XDM tree offering full query access using the methods
508
      <code java="net.sf.saxon.sapling.SaplingDocument#toXdmNode">SaplingDocument.toXdmNode()</code>
509
      or <code  java="net.sf.saxon.sapling.SaplingDocument#toNodeInfo">SaplingDocument.toNodeInfo()</code>. It is also possible to send the tree
510
      directly to a <code java="net.sf.saxon.s9api.Destination">Destination</code> such as a 
511
      <code java="net.sf.saxon.s9api.Serializer">Serializer</code>, a 
512
      <code java="net.sf.saxon.s9api.SchemaValidator">SchemaValidator</code>, or an 
513
      <code java="net.sf.saxon.s9api.Xslt30Transformer">Xslt30Transformer</code>.</p>
514
    
515
    <p>Sapling nodes are immutable objects, so operations like adding children or adding attributes always create a new object,
516
    without modifying the input objects. This means that adding a child element to a new parent can be done without an expensive
517
    copy operation. Nodes do not have references to their parents in the tree, so a subtree can be shared by multiple trees
518
    without copying.</p>
519
    
520
    <p>The Sapling Tree API is described in the JavaDoc for class <code java="net.sf.saxon.sapling.SaplingNode">SaplingNode</code>.</p>
521
    
522
    <h2 class="subtitle">Event APIs</h2>
523
    <p>Saxon 10 introduces a new event-based API (called simply "Push") designed explicitly for convenient use by 
524
      user-written applications.</p>
525
    
526
    <p>A <code>Push</code> instance is always created using the factory method <code>Processor.newPush(destination)</code>;
527
      the <code>destination</code> argument indicates what happens to the constructed document. 
528
      This will commonly be an <code>XdmDestination</code> to build an in-memory <code>XdmNode</code>,
529
      or a <code>Serializer</code> to create lexical XML,
530
      but it could also be, for example, an <code>XsltTransformer</code> or a <code>SchemaValidator</code>.</p>
531
    
532
    
533
    <p>Conventional event-based APIs such as the SAX <code>ContentHandler</code> and StAX <code>XMLStreamWriter</code>
534
    and <code>XMLEventWriter</code> rely on the application to issue a properly-nested
535
    sequence of calls to methods such as <code>startElement()</code> and <code>endElement()</code>. This can make
536
      it very difficult to diagnose errors if the calls are not properly matched. The Saxon <code
537
        java="net.sf.saxon.s9api.Push">Push</code> API differs in that
538
    a call to start a new element node returns an <code>Element</code> object representing that element, and methods to create attributes
539
      and children for the element, and to end the element, are defined as methods on that <code>Element</code> object.
540
      Furthermore, these methods return the element to which they are applied, allowing method chaining.
541
    So a typical sequence of calls might be:</p>
542
    
543
    <samp><![CDATA[   out.element("employee")
544
      .attribute("ssn", "123456")
545
      .attribute("location", "Berlin")
546
      .text("Helmut Schmidt")
547
      .close();
548
]]></samp>
549
    
550
    <p>This example constructs a slightly more complex tree:</p>
551
    
552
    <samp><![CDATA[   Processor processor = new Processor(false);
553
   Serializer destination = processor.newSerializer(new File("out.xml"));
554
   destination.setOutputProperty(Serializer.Property.INDENT, "no");
555
   Push.Document doc = processor.newPush(destination).document(true);
556
   doc.setDefaultNamespace("http://www.example.org/ns");
557
   Push.Element top = doc.element("root");
558
   top.attribute("version", "1.5");
559
   for (Employee emp : getData()) {
560
      top.element("emp")
561
         .attribute("ssn", emp.ssn)
562
         .text(emp.name);
563
   }
564
   doc.close(); 
565
]]></samp>
566
    
567
    <p>Note that there are no explicit <code>endElement</code> events here; an end tag is written automatically when
568
    the next sibling is written to the parent element, or when the parent element is closed. The <code>close()</code>
569
    method is available, however, to close an element explicitly, which can be useful to avoid errors when the writing
570
    of elements is distributed across many classes and methods.</p>
571
    
572
    <p>Saxon also allows trees to be communicated using other event-based APIs. In Java there are three such APIs worth considering:</p>
573
    <ul>
574
      <li>Saxon's <code>Receiver</code> API</li>
575
      <li>The SAX <code>ContentHandler</code> API</li>
576
      <li>The StAX <code>XMLStreamWriter</code> API</li>
577
    </ul>
578
    <p>The <code java="net.sf.saxon.event.Receiver">Receiver</code> is efficient, but it is proprietary to Saxon, is prone to minor changes from one release to another,
579
    and is designed primarily for internal use rather than for direct use from applications.</p>
580
    <p>The SAX <code>ContentHandler</code> API was designed primarily for communication from an XML parser to an application; it can be
581
    clumsy to use when the originator of events is something other than an XML parser.</p>
582
    <p>The StAX <code>XMLStreamWriter</code> is probably the best of the three interfaces for most
583
      applications. Saxon's <code java="net.sf.saxon.s9api.DocumentBuilder">DocumentBuilder</code> class
584
      offers a method <code java="net.sf.saxon.s9api.DocumentBuilder#newBuildingStreamWriter">newBuildingStreamWriter()</code> which returns an <code>XMLStreamWriter</code>; the calling application can
585
    then use methods such as <code>XMLStreamWriter.writeStartElement()</code> and <code>XmlStreamWriter.writeEndElement()</code>
586
    to build the tree.</p>
587
    <p>The trickiest part of this interface is probably the handling of namespaces. Saxon's implementation of the StAX interfaces takes
588
    into account not only the official Javadoc specifications (which in some respects are woefully inadequate), but also the unofficial
589
    interpretation of the specifications found at <a
590
      href="http://veithen.github.io/2009/11/01/understanding-stax.html" class="bodylink">Understanding StAX:
591
    How to Correctly Use XMLStreamWriter</a>.</p>
592
  </section>
593
  <section id="preloading" title="Preloading shared reference documents">
594
    <h1>Preloading shared reference documents</h1>
595
    <p>An option is available (<a class="bodylink code" href="/configuration/config-features"
596
        >Feature.PRE_EVALUATE_DOC_FUNCTION</a>) to indicate that calls to the <code>doc()</code>
597
      or <code>document()</code> functions with constant string arguments should be evaluated when a
598
      query or stylesheet is compiled, rather than at run-time. This option is intended for use when
599
      a reference or lookup document is used by all queries and transformations. Using this option
600
      has a number of effects:</p>
601
    <ol>
602
      <li>
603
        <p>The URI is resolved using the compile-time <code>URIResolver</code> rather than the
604
          run-time <code>URIResolver</code>.</p>
605
      </li>
606
      <li>
607
        <p>The document is loaded into a document pool held by the <a class="javalink"
608
            href="net.sf.saxon.Configuration">Configuration</a>, whose memory is released only when
609
          the <code>Configuration</code> itself ceases to exist.</p>
610
      </li>
611
      <li>
612
        <p>All queries and transformations using this document share the same copy.</p>
613
      </li>
614
      <li>
615
        <p>Any updates to the document that occur between compile-time and run-time have no
616
          effect.</p>
617
      </li>
618
    </ol>
619
    <p>The option is selected by using <code>Configuration.setConfigurationProperty()</code> or
620
        <code>TransformerFactory.setAttribute()</code> with the property name
621
        <code>Feature.PRE_EVALUATE_DOC_FUNCTION.name</code>. This option is not available from the
622
      command line because it has no useful effect with a single-shot compile-and-run interface.</p>
623
    <p>This option has no effect if the URI supplied to the <code>doc()</code> or
624
        <code>document()</code> function includes a fragment identifier.</p>
625
    <p>It is also possible to preload a specific document into the shared document pool from the
626
      Java application by using the call <code>config.getGlobalDocumentPool().add(doc, uri)</code>.
627
      When the <code>doc()</code> or <code>document()</code> function is called, the shared document
628
      pool is first checked to see if the requested document is already present. The <a
629
        class="javalink" href="net.sf.saxon.om.DocumentPool">DocumentPool</a> object also has a
630
        <code>discard()</code> method which causes the document to be released from the pool.</p>
631
    
632
    <aside>It is not advisable to use this option when a compiled stylesheet is exported to a SEF
633
    file. Data files are best deployed separately, rather than by embedding them in the SEF.</aside>
634
  </section>
635
  <section id="xml-catalogs" title="Using XML Catalogs">
636
    <h1>Using XML Catalogs</h1>
637

    
638

    
639
    <p>XML Catalogs (<a
640
        href="http://xml.apache.org/commons/components/resolver/resolver-article.html"
641
        class="bodylink">defined by OASIS</a>) provide a way to avoid hard-coding the locations of
642
      XML documents and other resources in your application. Instead, the application refers to the
643
      resource using a conventional system identifier (URI) or public identifier, and a local
644
      catalog is used to map the system and public identifiers to an actual location.</p>
645

    
646
    <p>When using Saxon from the command line, it is possible to specify a catalog to be used using
647
      the option <code>-catalog:<i>files</i></code>. Here <code><i>files</i></code> is the catalog
648
      file to be searched, or a list of filenames separated by semicolons. This catalog will be used
649
      to locate DTDs and external entities required by the XML parser, XSLT stylesheet modules
650
      requested using <code>xsl:import</code> and <code>xsl:include</code>, documents requested
651
      using the <code>document()</code> and <code>doc()</code> functions, and also schema documents,
652
      however they are referenced.</p>
653

    
654
    <p>
655
      <i>The catalog is NOT currently used for non-XML resources, including JSON documents, 
656
        query modules, unparsed text files, collations, and collections.</i>
657
    </p>
658

    
659
    <p>With Saxon on the Java platform, if the <code>-catalog</code> option is used on the command
660
      line, then the open-source Apache library <code>resolver.jar</code> must be present on the
661
      classpath. With Saxon on .NET, this module (cross-compiled to IL) is included within the Saxon
662
      DLL.</p>
663

    
664
    <p>Setting the <code>-catalog</code> option is equivalent to setting the following options:</p>
665

    
666
    <table>
667
      <tr>
668
        <td>
669
          <p>
670
            <code>-r</code>
671
          </p>
672
        </td>
673
        <td>
674
          <p>
675
            <code>org.apache.xml.resolver.tools.CatalogResolver</code>
676
          </p>
677
        </td>
678
      </tr>
679
      <tr>
680
        <td>
681
          <p>
682
            <code>-x</code>
683
          </p>
684
        </td>
685
        <td>
686
          <p>
687
            <code>org.apache.xml.resolver.tools.ResolvingXMLReader</code>
688
          </p>
689
        </td>
690
      </tr>
691
      <tr>
692
        <td>
693
          <p>
694
            <code>-y</code>
695
          </p>
696
        </td>
697
        <td>
698
          <p>
699
            <code>org.apache.xml.resolver.tools.ResolvingXMLReader</code>
700
          </p>
701
        </td>
702
      </tr>
703
    </table>
704

    
705
    <p>In addition, the system property <code>xml.catalog.files</code> is set to the value of the
706
      supplied <code><i>files</i></code> value. And if the <code>-t</code> option is also set, Saxon
707
      sets the verbosity level of the catalog manager to 2, causing it to report messages for each
708
      resolved URI. Saxon customizes the Apache resolver library to integrate these messages with
709
      the other output from the <code>-t</code> option: that is, by default it is sent to the
710
      standard error output.</p>
711

    
712
    <p>
713
      <i>This mechanism means that it is not possible to use any of the options <code>-r</code>,
714
          <code>-x</code>, or <code>-y</code> when the <code>-catalog</code> option is used.</i>
715
    </p>
716

    
717
    <p>When the <code>-catalog</code> option is used on the command line, this overrides the
718
      internal resolver used in Saxon (from 9.4) to redirect well-known W3C references (such as the
719
      XHTML DTD) to Saxon's local copies of these resources. Because both these features rely on
720
      setting the XML parser's <code>EntityResolver</code>, it is not possible to use them in
721
      conjunction.</p>
722

    
723
    <p>This support for OASIS catalogs is implemented only in the Saxon command line. To use
724
      catalogs from a Saxon application, it is necessary to configure the various options
725
      individually. For example:</p>
726

    
727
    <ul>
728
      <li>
729
        <p>To use catalogs to resolve references to DTDs and external entities, choose
730
            <code>ResolvingXMLReader</code> as your XML parser, or set
731
            <code>org.apache.xml.resolver.tools.CatalogResolver</code> as the
732
            <code>EntityResolver</code> used by your chosen XML parser.</p>
733
      </li>
734

    
735
      <li>
736
        <p>To use catalogs to resolve <code>xsl:include</code> and <code>xsl:import</code>
737
          references, choose <code>org.apache.xml.resolver.tools.CatalogResolver</code> as the
738
            <code>URIResolver</code> used by Saxon when compiling the stylesheet.</p>
739
      </li>
740

    
741
      <li>
742
        <p>To use catalogs to resolve calls on <code>doc()</code> or <code>document()</code>
743
          references, choose <code>org.apache.xml.resolver.tools.CatalogResolver</code> as the
744
            <code>URIResolver</code> used by Saxon when running the stylesheet (for example, using
745
            <code>Transformer.setURIResolver()</code>).</p>
746
      </li>
747
    </ul>
748

    
749
    <p>Here is an example of a very simple catalog file. The <code>publicId</code> and
750
        <code>systemId</code> attributes give the public or system identifier as used in the source
751
      document; the <code>uri</code> attribute gives the location (in this case a relative location)
752
      where the actual resource will be found.</p>
753

    
754

    
755

    
756
    <samp><![CDATA[<?xml version="1.0"?>
757
<catalog  xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">  
758
   <group  prefer="public"  xml:base="file:///usr/share/xml/" >  
759

    
760
      <public 
761
         publicId="-//OASIS//DTD DocBook XML V4.5//EN"  
762
         uri="docbook45/docbookx.dtd"/>
763

    
764
      <system
765
         systemId="http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd"  
766
         uri="docbook45/docbookx.dtd"/>
767

    
768
   </group>
769
</catalog>]]></samp>
770

    
771
    <p>There are many tutorials for XML catalogs available on the web, including some that have
772
      information specific to Saxon, though this may well relate to earlier releases.</p>
773
  </section>
774
  <section id="input-filters" title="Writing input filters">
775
    <h1>Writing input filters</h1>
776

    
777

    
778
    <p>Saxon can take its input from a JAXP <code>SAXSource</code> object, which essentially
779
      represents a sequence of SAX events representing the output of an XML parser. A very useful
780
      technique is to interpose a <i>filter</i> between the parser and Saxon. The filter will
781
      typically be an instance of the SAX2 <strong>XMLFilter</strong> class. </p>
782

    
783
    <p>There are a number of ways of using a Saxon XSLT transformation as part of a pipeline of
784
      filters. Some of these techniques also work with XQuery. The techniques include:</p>
785
    <ul>
786
      <li>
787
        <p>Generate the transformation as an <code>XMLFilter</code> using the
788
            <code>newXMLFilter()</code> method of the <code>TransformerFactory</code>. This works
789
          with XSLT only. A drawback of this approach is that it is not possible to supply
790
          parameters to the transformation using standard JAXP facilities. It is possible, however,
791
          by casting the <code>XMLFilter</code> to a <a class="javalink" href="net.sf.saxon.jaxp.FilterImpl"
792
            >net.sf.saxon.jaxp.FilterImpl</a>, and calling its <code>getTransformer()</code> method, which
793
          returns a <code>Transformer</code> object offering the usual <code>addParameter()</code>
794
          method.</p>
795
      </li>
796
      <li>
797
        <p>Generate the transformation as a SAX <code>ContentHandler</code> using the
798
            <code>newTransformerHandler()</code> method. The pipeline stages after the
799
          transformation can be added by giving the transformation a <code>SAXResult</code> as its
800
          destination. This again is XSLT only.</p>
801
      </li>
802
      <li>
803
        <p>Implement the pipeline step before the transformation or query as an
804
            <code>XMLFilter</code>, and use this as the <code>XMLReader</code> part of a
805
            <code>SAXSource</code>, pretending to be an XML parser. This technique works with both
806
          XSLT and XQuery, and it can even be used from the command line, by nominating the
807
            <code>XMLFilter</code> as the source parser using the <code>-x</code> option on the
808
          command line.</p>
809
      </li>
810
    </ul>
811

    
812
    <p>The <code>-x</code> option on the Saxon command line specifies the parser that Saxon will use
813
      to process the source files. This class must implement the SAX2 <code>XMLReader</code>
814
      interface, but it is not required to be a real XML parser; it can take the input from any kind
815
      of source file, so long as it presents it in the form of a stream of SAX events. When using
816
      the JAXP API, the equivalent to the <code>-x</code> option is to call
817
        <code>transformerFactory.setAttribute( net.sf.saxon.lib.Feature.SOURCE_PARSER_CLASS.name,
818
        'com.example.package.Parser')</code></p>
819
  </section>
820
  <section id="XInclude" title="XInclude processing">
821
    <h1>XInclude processing</h1>
822

    
823

    
824
    <p>If you are using Xerces as your XML parser, you can have Xerces expand any XInclude
825
      directives.</p>
826

    
827
    <p>The <code>-xi</code> option on the command line causes XInclude processing to be applied to
828
      all input XML documents. This includes source documents, stylesheets, and schema documents
829
      listed on the command line, and also those loaded indirectly for example by calls on the
830
        <code>doc()</code> function or by mechanisms such as <code>xsl:include</code> and
831
        <code>xs:include</code>.</p>
832

    
833
    <p>From the Java API, the equivalent is to call <code>setXInclude()</code> on the
834
        <code>Configuration</code> object, or to set the attribute denoted by <a
835
        class="bodylink code" href="/configuration/config-features">Feature.XINCLUDE.name</a> to
836
        <code>Boolean.TRUE</code> on the <code>TransformerFactory</code>.</p>
837

    
838
    <p>XInclude processing can be requested at a per-document level by creating an <a
839
        class="javalink" href="net.sf.saxon.lib.AugmentedSource">AugmentedSource</a> and calling its
840
        <code>setXIncludeAware()</code> method. The corresponding method is also recognized on
841
      Saxon's implementation of the JAXP <code>DocumentBuilderFactory</code>. When the
842
        <code>doc()</code> or <code>document()</code> or <code>collection()</code> function is
843
      called from an XPath expression, XInclude processing can be enabled by including
844
        <code>xinclude=yes</code> among the query parameters in the URI.</p>
845
    
846
    <p>It is possible to request XInclude processing for the documents in a collection by including
847
    the query parameter <code>xinclude=yes</code> in the collection URI. Similarly, for a document
848
    read using the <code>doc()</code> or <code>document()</code> functions, XInclude processing can
849
      be requested using <code>xinclude=yes</code> in the document URI -- but only if the
850
    <code>StandardURIResolver</code> is used, and the feature is enabled by calling
851
      <code>Configuration.setParameterizedURIResolver()</code> or by setting <code>-p:on</code>
852
    on the <code>Query</code> or <code>Transform</code> command lines.</p>
853
    
854
    <p>The <a class="bodylink code" href="/xsl-elements/source-document">xsl:source-document</a>
855
      instruction can enable XInclude processing using
856
    the extension attribute <code>saxon:xinclude="yes"</code>.</p>
857

    
858
    <p>It is also possible to switch on XInclude processing (for all documents) by setting the
859
      system property:</p>
860
    <samp><![CDATA[-Dorg.apache.xerces.xni.parser.XMLParserConfiguration=
861
    org.apache.xerces.parsers.XIncludeParserConfiguration
862
]]></samp>
863

    
864
    <p>An alternative approach is to incorporate an XInclude processor as a SAX filter in the input
865
      pipeline. You can find a suitable SAX filter at <a href="http://xincluder.sourceforge.net/"
866
        class="bodylink">http://xincluder.sourceforge.net/</a>, and you can incorporate it into your
867
      application as described in <a class="bodylink" href="../input-filters">Writing Input
868
        Filters</a>.</p>
869

    
870
    <p>On the .NET platform, there is a customized <code>XmlReader</code> that performs XInclude
871
      processing available at <a href="http://mvpxml.codeplex.com" class="bodylink"
872
        >http://mvpxml.codeplex.com</a>. You can supply this as an argument to the method
873
        <code>Build(XmlReader parser)</code> in the <a class="javalink"
874
        href="Saxon.Api.DocumentBuilder">DocumentBuilder</a> class of the .NET Saxon API.</p>
875

    
876
    <p>For further information on using XInclude, see <a
877
        href="http://www.sagehill.net/docbookxsl/Xinclude.html" class="bodylink"
878
        >http://www.sagehill.net/docbookxsl/Xinclude.html</a>.</p>
879
  </section>
880
  <section id="controlling-parsing" title="Controlling Parsing of Source Documents">
881
    <h1>Controlling Parsing of Source Documents</h1>
882

    
883

    
884
    <p>Saxon does not include its own XML parser. By default:</p>
885

    
886
    <ul>
887
      <li>
888
        <p>On the Java platform, the default SAX parser provided as part of the JDK is used. With
889
          the Sun/Oracle JDK, this is a variant of the Apache Xerces parser customized by Sun.</p>
890
      </li>
891
      <li>
892
        <p>On the .NET platform, Saxon includes a copy of the Apache Xerces parser cross-compiled to
893
          run on .NET.</p>
894
      </li>
895
    </ul>
896

    
897
    <p>An error reported by the XML parser is generally fatal. It is not possible to process
898
      ill-formed XML.</p>
899

    
900
    <p>There are several ways you can cause a different XML parser to be used:</p>
901

    
902
    <ul>
903
      <li>
904
        <p>The <code>-x</code> and <code>-y</code> options on the command line can be used to
905
          specify the class name of a SAX parser, which Saxon will load in preference to the default
906
          SAX parser. The <code>-x</code> option is used for source XML documents, the
907
            <code>-y</code> option for schemas and stylesheets. The equivalent options can be set
908
          programmatically or by using the <a class="bodylink"
909
            href="/configuration/configuration-file">configuration file</a>.</p>
910
      </li>
911
      <li>
912
        <p>By default Saxon uses the <code>SAXParserFactory</code> mechanism to load a parser. This
913
          can be configured by setting the system property
914
            <code>javax.xml.parsers.SAXParserFactory</code>, by means of the file
915
            <code>lib/jaxp.properties</code> in the JRE directory, or by adding another parser to
916
          the <code>lib/endorsed</code> directory.</p>
917
      </li>
918
      <li>
919
        <p>The source for parsing can be supplied in the form of a <code>SAXSource</code> object,
920
          which has an <code>XMLReader</code> property containing the parser instance to be
921
          used.</p>
922
      </li>
923
      <li>
924
        <p>On .NET, the configuration option <code>PREFER_JAXP_PARSER</code> can be set to false, in
925
          which case Saxon will use the Microsoft XML parser instead of the Apache parser. (This
926
          parser is not used by default because it does not notify <code>ID</code> attributes to the
927
          application, which means the XPath <code>id()</code> and <code>idref()</code> functions do
928
          not work.)</p>
929
      </li>
930
    </ul>
931

    
932
    <p>Saxonica traditionally recommended use of the Xerces parser from Apache in preference to the version bundled
933
      in the JDK, which was known to have some serious bugs. However, there is some evidence that the version bundled
934
    in Java 8 is more reliable.</p>
935

    
936
    <p>By default, Saxon invokes the parser in non-validating mode (that is, without requested DTD
937
      validation). Note however, that the parser still needs to read the DTD if one is present,
938
      because it may contain entity definitions that need to be expanded. DTD validation can be
939
      requested using <code>-dtd:on</code> on the command line, or equivalent API or configuration
940
      options.</p>
941

    
942
    <p>Saxon is issued with local copies of commonly-used W3C DTDs such as the XHTML, SVG, and
943
      MathML DTDs. When Saxon itself instantiates the XML parser, it will use an
944
        <code>EntityResolver</code> that causes these local copies of DTDs to be used rather than
945
      fetching public copies from the web (the W3C servers are increasingly failing to serve these
946
      requests as the volume of traffic is too high). It is possible to override this using the
947
      configuration setting <code>ENTITY_RESOLVER_CLASS</code>, which can be set to the name of a
948
      user-supplied <code>EntityResolver</code>, or to the empty string to indicate that no
949
        <code>EntityResolver</code> should be used. Saxon will not add this
950
        <code>EntityResolver</code> in cases where the XML parser instance is supplied by the caller
951
      as part of a <code>SAXSource</code> object. It will add it to a parser obtained as an instance
952
      of the class specified using the <code>-x</code> and <code>-y</code> command line options,
953
      unless either the use of the <code>EntityResolver</code> is suppressed using the
954
        <code>ENTITY_RESOLVER_CLASS</code> configuration option, or the instantiated parser already
955
      has an <code>EntityResolver</code> registered.</p>
956

    
957
    <p>Saxon never asks the XML parser to perform schema validation. If schema validation is
958
      required it should be requested using the command line options <code>-val:strict</code> or
959
        <code>-val:lax</code>, or their API equivalents. Saxon will then use its own schema
960
      processor to validate the document as it emerges from the XML parser. Schema processing is
961
      done in parallel with parsing, by use of a SAX-like pipeline.</p>
962

    
963

    
964

    
965

    
966

    
967
  </section>
968
  <section id="xml11" title="Saxon and XML 1.1">
969
    <h1>Saxon and XML 1.1</h1>
970

    
971

    
972
    <p>XML 1.1 (with XML Namespaces 1.1) originally extended XML 1.0 in three ways:</p>
973
    <ul>
974
      <li>
975
        <p>the set of valid characters is increased</p>
976
      </li>
977
      <li>
978
        <p>the set of characters allowed in XML Names is increased</p>
979
      </li>
980
      <li>
981
        <p>namespace undeclarations are permitted</p>
982
      </li>
983
    </ul>
984

    
985
    <p>The second change has subsequently been retrofitted to XML 1.0 Fifth Edition (XML 1.0e5).
986
      Saxon now uses the XML 1.1 and XML 1.0e5 rules unconditionally for all validation of XML
987
      names.</p>
988

    
989
    <p>Saxon is capable of working with XML 1.1 input documents. If you want to use Saxon with XML
990
      1.1, you should set the option <code>-xmlversion:1.1</code> on the Saxon command line, or call
991
      the method <a class="javalink" href="net.sf.saxon.Configuration#setXMLVersion"
992
        >configuration.setXMLVersion(Configuration.XML11)</a> or, in the case of XSLT,
993
        <code>transformerFactory.setAttribute(FeaturesKeys.XML_VERSION, "1.1")</code>.</p>
994

    
995
    <p>This configuration setting affects:</p>
996
    <ul>
997
      <li>
998
        <p>the characters considered valid in the source of an XQuery query</p>
999
      </li>
1000
      <li>
1001
        <p>the characters considered valid in the result of the functions
1002
            <code>codepoints-to-string()</code> and <code>unparsed-text()</code></p>
1003
      </li>
1004
      <li>
1005
        <p>the characters considered valid in the result of certain Saxon extension functions</p>
1006
      </li>
1007
      <li>
1008
        <p>the way in which line endings in XQuery queries are normalized</p>
1009
      </li>
1010
      <li>
1011
        <p>the default version used by the serializer (with output method XML)</p>
1012
      </li>
1013
    </ul>
1014

    
1015
    <p>Since Saxon 9.4, the configuration setting no longer affects:</p>
1016
    <ul>
1017
      <li>
1018
        <p>validation of names used in XQuery and XPath expressions, including names of elements,
1019
          attributes, functions, variables, and types</p>
1020
      </li>
1021
      <li>
1022
        <p>validation of names of constructed elements, attributes, and processing instructions in
1023
          XQuery and XSLT</p>
1024
      </li>
1025
      <li>
1026
        <p>schema validation of values of type <code>xs:NCName</code>, <code>xs:QName</code>,
1027
            <code>xs:NOTATION</code>, and <code>xs:ID</code></p>
1028
      </li>
1029
      <li>
1030
        <p>the permitted names of stylesheet objects such as keys, templates, decimal-formats,
1031
          output declarations, and output methods</p>
1032
      </li>
1033
    </ul>
1034

    
1035

    
1036
    <p>Note that if you use the default setting of "1.0", then supplying an XML 1.1 source document
1037
      as input may cause undefined errors.</p>
1038

    
1039
    <p>It is advisable to use an XML parser that supports XML 1.1 when the configuration is set to
1040
      "1.1", and an XML parser that does not support XML 1.1 when the configuration is set to "1.0".
1041
      However, Saxon does not enforce this.</p>
1042

    
1043
    <p>You can set the configuration to allow XML 1.1, but still serialize result documents as XML
1044
      1.0 by specifying the output property <code>version="1.0"</code>. In this case Saxon will
1045
      check while serializing the document that it conforms to the XML 1.0 constraints (note that
1046
      this check can be expensive). These checks are not performed if the configuration default is
1047
      set to XML 1.0.</p>
1048

    
1049
    <p>If you want the serializer to output namespace undeclarations, use the output property
1050
        <code>undeclare-namespaces="yes"</code> as well as <code>version="1.1"</code>.</p>
1051
  </section>
1052
  <section id="jaxpsources" title="JAXP Source Types">
1053
    <h1>JAXP Source Types</h1>
1054

    
1055

    
1056
    <p>
1057
      <i>This section is relevant to the Java platform only.</i>
1058
    </p>
1059

    
1060
    <p>When a user application invokes Saxon via the Java API, then a source document is supplied as
1061
      an instance of the JAXP <code>Source</code> class. This is true whether invoking an XSLT
1062
      transformation, an XQuery query, or a free-standing XPath expression. The <code>Source</code>
1063
      class is essentially a marker interface. The <code>Source</code> that is supplied must be a
1064
      kind of <code>Source</code> that Saxon recognizes.</p>
1065

    
1066
    <p>Saxon recognizes all three kinds of <code>Source</code> defined in JAXP: a
1067
        <code>StreamSource</code>, a <code>SAXSource</code>, and a <code>DOMSource</code>. </p>
1068
    
1069
    <ul>
1070
      <li>
1071
        <p>When using a <code>StreamSource</code>, note:</p>
1072
        <ul>
1073
          <li>A <code>StreamSource</code> that wraps an <code>InputStream</code> or <code>Reader</code>
1074
            can only be used once: it is consumed by use. However, a <code>StreamSource</code> that wraps
1075
          a <code>File</code> or URI can be used multiple times.</li>
1076
          <li>Whoever creates an <code>InputStream</code> or <code>Reader</code> is responsible for closing
1077
          it after use. This means that if Saxon creates an <code>InputStream</code> from a supplied <code>File</code>
1078
            or URI, it will close that <code>InputStream</code> after use; but if the <code>InputStream</code> is created
1079
          by the calling application, then the calling application is responsible for closing it. (On some operating systems
1080
          it is important not to leave unclosed streams lying around.)</li>
1081
          <li>If the <code>StreamSource</code> wraps an <code>InputStream</code> or <code>Reader</code>, then the base URI
1082
          of the document is taken from the <code>SystemID</code> property of the <code>StreamSource</code>. If this is not set,
1083
          then the base URI is unknown, which may cause constructs that require a known base URI to fail.</li>
1084
        </ul>
1085
        <aside>There are cases where it is difficult for the application to take responsibility for closing a stream after it has been read to completion.
1086
        For example, if a <code>URIResolver</code> returns a <code>StreamSource</code>, there is no callback from Saxon
1087
        to the application at the time the stream has been exhausted. Saxon therefore allows the <code>StreamSource</code>
1088
        to be wrapped in an <code>AugmentedSource</code>, whose <code>setPleaseCloseAfterUse()</code> method can be used
1089
        to request that Saxon closes the stream.</aside>
1090
      
1091
      </li>
1092
      <li>
1093
        <p>When using a <code>SAXSource</code>, note:</p>
1094
        <ul>
1095
          <li>If no <code>XMLReader</code> is supplied, Saxon will allocate one, based on settings in the <code>Configuration</code>.</li>
1096
          <li>Processing of the contained <code>InputSource</code> is entirely the responsibility of the XML parser; Saxon is not involved
1097
          in this.</li>
1098
          <li>Saxon will modify properties of the supplied <code>XMLReader</code>: it will set the <code>ContentHandler</code>
1099
          and <code>LexicalHandler</code> so that it can receive the output of parsing, and it will set the <code>ErrorHandler</code>
1100
          so it can handle parsing errors.</li>
1101
          <li>Saxon makes no attempt to ensure that processing of a <code>SAXSource</code> or its underlying <code>XMLReader</code>
1102
          is thread-safe. The same <code>XMLReader</code> should not be used concurrently in multiple threads.</li>
1103
        </ul>
1104
        
1105
      </li>
1106
      <li>
1107
        <p>When using a <code>DOMSource</code>, note:</p>
1108
        <ul>
1109
          <li>The DOM is not thread-safe, even when used in read-only mode. Saxon therefore synchronizes all its access to DOM methods.
1110
          However, that's no protection if there are application threads accessing the DOM that aren't using Saxon.</li>
1111
          <li>The base URI
1112
            of the document is taken from the <code>SystemID</code> property of the <code>DOMSource</code>. If this is not set,
1113
            then the base URI is unknown, which may cause constructs that require a known base URI to fail.</li>
1114
          <li>From Saxon 9.8, Saxon-EE uses a new mechanism for processing DOM trees, called the Domino model. This involves creating
1115
          an index of all the nodes in the DOM, providing for faster navigation. Saxon-PE and Saxon-HE continue to use the DOM <code>NodeWrapper</code>
1116
          model, where DOM methods are used to navigate the tree. A transformation using the Domino model takes typically twice as long as Saxon's native <code>TinyTree</code>,
1117
          while the <code>NodeWrapper</code> model can take 5 to 10 times as long. An alternative approach is to convert the DOM tree to a <code>TinyTree</code> before the
1118
          transformation starts. Even better: don't use DOM in the first place.</li>
1119
        </ul>
1120
      </li>
1121
    </ul>
1122
        
1123
        <p>Other kinds of <code>Source</code> that are recognized by most Saxon interfaces are:</p>
1124
        
1125
        <ul>
1126
          <li><code>TreeInfo</code>: Saxon's <code>TreeInfo</code> holds information about a document (or more generally any tree of nodes), 
1127
            and can be used directly as a <code>Source</code> of a transformation.</li>
1128
          <li><code>NodeInfo</code>: Saxon's <code>NodeInfo</code> represents a node in a tree, 
1129
            and can be used directly as a <code>Source</code> of a transformation.</li>
1130
          <li><code>StaxSource</code>: allows a pull parser to be used.</li>
1131
          <li><code>PullSource</code>: Saxon's internal pull interface.</li>
1132
          <li><code>EventSource</code>: Similar to an <code>XMLReader</code>,but with a much simpler interface, an <code>EventSource</code>
1133
          has a <code>send()</code> method that sends a stream of events to a Saxon <code>Receiver</code>.</li>
1134
          <li><code>SaplingDocument</code>: a sapling tree constructed using the sapling construction interface can be used anywhere
1135
          (within Saxon) that a <code>Source</code> is expected.</li>
1136
        </ul>
1137
      
1138
    
1139

    
1140
    <p>Saxon also accepts input from an <code>XMLStreamReader</code>
1141
        (<code>javax.xml.stream.XMLStreamReader</code>), that is a StAX pull parser as defined in
1142
      JSR 173. This is achieved by creating an instance of <a class="javalink"
1143
        href="net.sf.saxon.pull.StaxBridge">net.sf.saxon.pull.StaxBridge</a>, supplying the
1144
        <code>XMLStreamReader</code> using the <code>setXMLStreamReader()</code> method, and
1145
      wrapping the <code>StaxBridge</code> object in an instance of <a class="javalink"
1146
        href="net.sf.saxon.pull.PullSource">net.sf.saxon.pull.PullSource</a>, which implements the
1147
      JAXP <code>Source</code> interface and can be used in any Saxon method that expects a
1148
        <code>Source</code>. Saxon has been validated with two StAX parsers: the Zephyr parser from
1149
      Sun (which is supplied as standard with JDK 1.6), and the open-source Woodstox parser from
1150
      Tatu Saloranta. In Saxonica's experience, Woodstox is the more reliable of the two. However, there is
1151
      no immediate benefit in using a pull parser to supply Saxon input rather than a push parser;
1152
      the main use case for using an <code>XMLStreamReader</code> is when the data is supplied from
1153
      some source other than parsing of lexical XML.</p>
1154

    
1155
    <p>Nodes in Saxon's implementation of the XPath data model are represented by the interface <a
1156
        class="javalink" href="net.sf.saxon.om.NodeInfo">NodeInfo</a>. A <code>NodeInfo</code> is
1157
      itself a <code>Source</code>, which means that any method in the API that requires a source
1158
      object will accept any implementation of <code>NodeInfo</code>. As discussed in the next
1159
      section, implementations of <code>NodeInfo</code> are available to wrap Axiom, DOM, DOM4J,
1160
      JDOM2, or XOM nodes, and in all cases these wrapper objects can be used wherever a
1161
        <code>Source</code> is required.</p>
1162

    
1163
    <p>Saxon also provides a class <a class="javalink" href="net.sf.saxon.lib.AugmentedSource"
1164
        >net.sf.saxon.lib.AugmentedSource</a> which implements the <code>Source</code> interface.
1165
      This class encapsulates one of the standard <code>Source</code> objects, and allows additional
1166
      processing options to be specified. These options include whitespace handling, schema and DTD
1167
      validation, XInclude processing, error handling, choice of XML parser, and choice of Saxon
1168
      tree model.</p>
1169

    
1170
    <p>Saxon allows additional <code>Source</code> types to be supported by registering a <a
1171
        class="javalink" href="net.sf.saxon.lib.SourceResolver">SourceResolver</a> with the <a
1172
        class="javalink" href="net.sf.saxon.Configuration">Configuration</a> object. The task of a
1173
        <code>SourceResolver</code> is to convert a <code>Source</code> that Saxon does not
1174
      recognize into a <code>Source</code> that it does recognize. For example, this may be done by
1175
      building the document tree in memory and returning the <a class="javalink"
1176
        href="net.sf.saxon.om.NodeInfo">NodeInfo</a> object representing the root of the tree.</p>
1177
  </section>
1178
  <section id="thirdparty"
1179
    title="Third-party Object Models: Axiom, DOM, JDOM2, XOM, and DOM4J">
1180
    <h1>Third-party Object Models: Axiom, DOM, JDOM2, XOM, and DOM4J</h1>
1181

    
1182

    
1183
    <p>
1184
      <i>This section is relevant to the Java platform only.</i>
1185
    </p>
1186

    
1187
    <p>In the case of DOM, all Saxon editions support DOM access "out of the box", and no special
1188
      configuration action is necessary. See also <a class="bodylink" href="/sourcedocs/domino">The Domino Tree Model</a>.</p>
1189

    
1190
    <p>Support for Axiom, JDOM2, XOM, and DOM4J is not available "out of the box" with
1191
      Saxon-HE, but the source code is open source (in sub-packages of
1192
        <code>net.sf.saxon.option</code>) and can be compiled for use with Saxon-HE if required.</p>
1193

    
1194
    <aside>In general, use of a third party tree implementation is much less efficient than using
1195
      Saxon's native <code>TinyTree</code>. These models should only be used if your application
1196
      needs to construct them for other reasons. Transforming a DOM can take up to 10 times longer
1197
      than transforming the equivalent <code>TinyTree</code>.</aside>
1198

    
1199

    
1200
    <p>The support code for Axiom, DOM4J, JDOM2, and XOM is integrated into the main JAR files
1201
      for Saxon-PE and Saxon-EE, but (unlike the case of DOM) it is not activated unless the object
1202
      model is registered with the <a class="javalink" href="net.sf.saxon.Configuration"
1203
        >Configuration</a>. To activate support for one of these models, the implementation must either be included 
1204
      in the relevant section of the
1205
      configuration file, or it must be nominated to the configuration using the method <a class="javalink"
1206
        href="net.sf.saxon.Configuration#registerExternalObjectModel"
1207
        >registerExternalObjectModel()</a>. </p>
1208
    
1209
    <aside>Support for JDOM version 1 is dropped with effect from Saxon 10.0. Applications should migrate
1210
    to JDOM2.</aside>
1211

    
1212
    <p>Each supported object model is represented in Saxon by a <a class="javalink"
1213
        href="net.sf.saxon.om.TreeModel">TreeModel</a> object, which in the case of external object
1214
      models will also be an instance of <a class="javalink"
1215
        href="net.sf.saxon.lib.ExternalObjectModel">ExternalObjectModel</a>. The
1216
        <code>TreeModel</code> can be used to get a <code>Builder</code>, which can then be used to
1217
      construct an instance of the model from SAX input. The <code>Builder</code> can also be
1218
      inserted into a pipeline to capture the output of a transformation or query.</p>
1219

    
1220
    <p>For DOM input, the source can be supplied by wrapping a <code>DOMSource</code> around the DOM
1221
      Document node. For Axiom, JDOM2, XOM, and DOM4J the approach is similar, except that the
1222
      wrapper classes are supplied by Saxon itself: they are <a class="javalink"
1223
        href="net.sf.saxon.option.axiom.AxiomDocument"
1224
        >net.sf.saxon.option.axiom.AxiomDocument</a>,  <a class="javalink"
1225
        href="net.sf.saxon.option.jdom2.JDOM2DocumentWrapper"
1226
        >net.sf.saxon.option.jdom2.JDOM2DocumentWrapper</a>, <a class="javalink"
1227
        href="net.sf.saxon.option.xom.XOMDocumentWrapper"
1228
        >net.sf.saxon.option.xom.XOMDocumentWrapper</a>, and <a class="javalink"
1229
        href="net.sf.saxon.option.dom4j.DOM4JDocumentWrapper"
1230
        >net.sf.saxon.option.dom4j.DOM4JDocumentWrapper</a> respectively. These wrapper classes
1231
      implement the Saxon <a class="javalink" href="net.sf.saxon.om.NodeInfo">NodeInfo</a> interface
1232
      (which means that they also implement <code>Source</code>).</p>
1233

    
1234

    
1235
    <aside>Note that the Xerces DOM implementation is not thread-safe, even for read-only access.
1236
      Saxon's wrapper classes for the DOM therefore synchronize all access to the DOM. This provides
1237
      thread-safety, but only if the application takes care to avoid creating more than one wrapper
1238
      for the same DOM Document.</aside>
1239

    
1240
    <p>Saxon supports these models by wrapping each external node in a wrapper that implements the
1241
      Saxon <code>NodeInfo</code> interface. When nodes are returned by the XQuery or XPath API,
1242
      these wrappers are removed and the original node is returned. Similarly, the wrappers are
1243
      generally removed when extension functions expecting a node are called.</p>
1244

    
1245
    <p>Saxon does not support wrapping of an external tree that contains entity reference nodes.
1246
      Most parsers provide an option to avoid constructing a tree that contains such nodes. For
1247
      example, with the JDK Xerces DOM parser, use <code>DOMParser dp = new DOMParser();
1248
        dp.setFeature("http://apache.org/xml/features/dom/create-entity-ref-nodes",
1249
        expandEntities);</code>. If there is a need to process a tree that does contain entity
1250
      references, it should be copied to a Saxon tree. (Note, this only affects entities explicitly
1251
      declared in a DTD. It does not affect character references or built-in entity references such
1252
      as <code>&amp;lt;</code>, which never appear as entity reference nodes in the tree.)</p>
1253

    
1254
    <p>In the case of DOM only, Saxon also supports a wrapping the other way around: an object
1255
      implementing the DOM interface may be wrapped around a Saxon <code>NodeInfo</code>. This is
1256
      done when Java methods expecting a DOM <code>Node</code> are called as extension functions, if
1257
      the <code>NodeInfo</code> is not itself a wrapper for a DOM <code>Node</code>.</p>
1258

    
1259
    <p>You can also send output to a DOM by using a <code>DOMResult</code>, or to a JDOM2 tree by
1260
      using a <code>JDOM2Result</code>, or to a XOM document by using a <code>XOMWriter</code>. In
1261
      such cases it is a good idea to set <code>saxon:require-well-formed="yes"</code> on
1262
        <code>xsl:output</code> to ensure that the transformation or query result is a well-formed
1263
      document (for example, that it does not contain several elements at the top level).</p>
1264

    
1265
    <p>External object models do not in all cases fully support the XDM (XPath data model). In
1266
      particular, many of them have restrictions concerning the recognition of <code>ID</code> and
1267
        <code>IDREF</code> attributes. In most cases they do not allow "namespace undeclarations" (so
1268
      a prefix that is in-scope for a parent element will always be in-scope for its child elements).
1269
      None of the external object models support typed
1270
      (schema-validated) data, and none support in-situ update using XQuery updates.</p>
1271
  </section>
1272
  <section id="choosingmodel" title="Choosing a Tree Model">
1273
    <h1>Choosing a Tree Model</h1>
1274

    
1275

    
1276
    <p>Saxon provides several implementations of the internal tree data structure (or tree model).
1277
      The tree model can be chosen by an option on the command line (<code>-tree:tiny</code> for the
1278
      tiny tree, <code>-tree:linked</code> for the linked tree). There is also a variant of the tiny
1279
      tree called a "condensed tiny tree" which saves space (at the expense of build time) by
1280
      recognizing text nodes and attribute nodes whose values appear more than once in the input
1281
      document. The tree model can also be selected from the Java API. The default is to use the
1282
      tiny tree model. The choice should make no difference to the results of a transformation
1283
      (except the order of attributes and namespace declarations) but only affects performance.</p>
1284

    
1285
    <p>
1286
      <i>The "linked tree" is the only model to support in-situ updates, so if you are using XQuery
1287
        Update you must choose this model.</i>
1288
    </p>
1289

    
1290
    <p>Generally speaking, the tiny tree model is both faster to build and faster to navigate. It
1291
      also uses less space.</p>
1292

    
1293
    <p>The tiny tree model gives most benefit when you are processing a large document. It uses a
1294
      lot less memory, so it can prevent thrashing when the size of document is such that the linked
1295
      tree doesn't fit in real memory. Use the "condensed" variant if you need to save memory, and
1296
      if your source data contains many text or attribute nodes with repeated values.</p>
1297
    
1298
    <p>Saxon also offers the option <code>-tree:condensed</code>. This delivers a TinyTree with
1299
    additional compression. Specifically, when a document contains multiple text nodes or
1300
    attribute nodes with the same string value, the condensed tree will "common up" the storage
1301
    for these nodes. This option gives a further reduction in memory usage, at the cost of slower
1302
    tree construction.</p>
1303

    
1304
    <p>The linked tree is used internally to represent stylesheet and schema modules because of the
1305
      programming convenience it offers: it allows element nodes on the tree to be represented by
1306
      custom classes for each kind of element. The linked tree is also needed when you want to use
1307
      XQuery Update, because unlike the tiny tree, it is mutable.</p>
1308

    
1309
    <p>
1310
      <i>If in doubt, stick with the default.</i>
1311
    </p>
1312
  </section>
1313
  <section id="domino" title="The Domino Tree Model">
1314
    <h1>The Domino Tree Model</h1>
1315
    <p>The Domino tree model was introduced in Saxon 9.8 and is available in Saxon-EE only. It is a new approach
1316
    to the handling of DOM source trees.</p>
1317
    <p>The Domino data structure is essentially a combination of the DOM and parts of the TinyTree. It takes the
1318
    unchanged DOM tree, and indexes it with vectors containing information (for each DOM node) about the node kind,
1319
    node name, and level in the document. These vectors are exactly the same as those used in the TinyTree; the difference
1320
    is that there is no text content, or attributes; these are replaced by references to the DOM nodes. 
1321
    All navigation around the tree is done purely using the index vectors,
1322
    while retrieval of the string value of text and attribute nodes is done by reference to the DOM structure. The effect
1323
    is that navigation is almost as fast as using the TinyTree, but queries are still able to return the original DOM Nodes.</p>
1324
    <p>Overall, queries and transformations using the Domino model take about double the time of the same query using the
1325
    TinyTree, compared with 5 to 10 times longer using the DOM Wrapper model. There is an initial overhead in building
1326
    the indexes, but this is incurred once only.</p>
1327
    <p>The Domino model must not be used with a DOM tree that is subject to update, other than changes to the values of
1328
    attribute or text nodes, which might work (but are still best avoided). Saxon has no way of preventing or detecting
1329
    updates, so these will generally cause catastrophic failure.</p>
1330
    
1331
  </section>
1332
  <section id="ptree" title="The PTree File Format">
1333
    <h1>The PTree File Format</h1>
1334

    
1335
    <p>The PTree (persistent tree) was a binary XML serialization supported by earlier Saxon
1336
    releases. It has been dropped from the product with effect from Saxon 10.0. Third-party
1337
    offerings such as EXI do the same job better.</p>
1338
 
1339
  </section>
1340
  <section id="validation" title="Validation of Source Documents">
1341
    <h1>Validation of Source Documents</h1>
1342

    
1343

    
1344
    <p>With Saxon-EE, source documents may be validated against a schema. Not only does this perform
1345
      a check that the document is valid, it also adds type information to each element and
1346
      attribute node in the document to identify the schema type against which it was validated. It
1347
      may also expand the source document by adding default values of elements and attributes.</p>
1348

    
1349
    <p>If the option <code>-val:strict</code> is specified on the command line for
1350
        <code>com.saxonica.Query</code> or <code>com.saxonica.Transform</code>, then the principal
1351
      source document to the query or transformation is schema-validated, as is every document
1352
      loaded using the <code>doc()</code> or <code>document()</code> function. Saxon will look among
1353
      all the loaded schemas for an element declaration that matches the outermost element of the
1354
      document, and will then check that the document is valid against that element declaration,
1355
      reporting a fatal error if it is not. The loaded schemas include schemas imported statically
1356
      into the query or stylesheet using <code>import schema</code> or
1357
        <code>xsl:import-schema</code>, schemas referenced in the <code>xsi:schemaLocation</code> or
1358
        <code>xsi:noNamespaceSchemaLocation</code> attributes of the source document itself, and
1359
      schemas loaded by the application using the <code>addSchema</code> method of the <a
1360
        class="javalink" href="net.sf.saxon.Configuration">Configuration</a> object.</p>
1361

    
1362
    <p>As an alternative to <code>-val:strict</code>, the option <code>-val:lax</code> may be
1363
      specified. This validates the document if and only if an element declaration can be found. If
1364
      there is no declaration of the outermost element in any loaded schema, then it is left as an
1365
      untyped document.</p>
1366

    
1367
    <p>When invoking transformations or queries from the Java API, the equivalent of the
1368
        <code>-val:strict</code> option is to call the method
1369
        <code>setSchemaValidation(Validation.STRICT)</code> on the <code>Configuration</code>
1370
      object. The equivalent of <code>-val:lax</code> is
1371
        <code>setSchemaValidation(Validation.LAX)</code>.</p>
1372

    
1373
    <p>When documents are built using the <a class="javalink"
1374
        href="net.sf.saxon.s9api.DocumentBuilder">DocumentBuilder</a> in the s9api interface, or the
1375
        <a class="javalink" href="Saxon.Api.DocumentBuilder">DocumentBuilder</a> in the Saxon.Api
1376
      interface on .NET, validation may be controlled by setting the appropriate options on the
1377
        <code>DocumentBuilder</code>.</p>
1378

    
1379
    <p>On Java interfaces that expect a JAXP <code>Source</code> object it is possible to request
1380
      validation by supplying an <a class="javalink" href="net.sf.saxon.lib.AugmentedSource"
1381
        >AugmentedSource</a>. This consists of a <code>Source</code> and a set of options, including
1382
      validation options; since <code>AugmentedSource</code> implements the JAXP <code>Source</code>
1383
      interface it is possible to use it anywhere that a <code>Source</code> is expected, including
1384
      as the object returned by a user-written <code>URIResolver</code>.</p>
1385

    
1386
    <p>Saxon's standard <code>URIResolver</code> uses this technique if it has been enabled (for
1387
      example by using <code>-p</code> on the command line). With this option, any URI containing
1388
      the query parameter <code>?val=strict</code> (for example,
1389
        <code>doc('source.xml?val=strict')</code>) causes strict validation to be requested for that
1390
      document, while <code>?val=lax</code> requests lax validation, and <code>?val=strip</code>
1391
      requests no validation.</p>
1392
    
1393
    <p>XSLT 3.0 provides a standard way of requesting validation for individual source documents,
1394
      using the <code>validation</code> and <code>type</code> attributes of the <a class="bodylink
1395
        code" href="/xsl-elements/source-document">xsl:source-document</a> instruction.</p>
1396
    
1397
  </section>
1398
  <section id="whitespace" title="Whitespace Stripping in Source Documents">
1399
    <h1>Whitespace Stripping in Source Documents</h1>
1400

    
1401

    
1402
    <p>A number of factors combine to determine whether whitespace-only text nodes in the source
1403
      document are visible to the user-written XSLT or XQuery code.</p>
1404

    
1405
    <p>By default, if there is a DTD or schema, then <i>ignorable whitespace</i> is stripped from
1406
      any source document loaded from a <code>StreamSource</code> or <code>SAXSource</code>.
1407
      Ignorable whitespace is defined as the whitespace that appears separating the child elements
1408
      in elements declared to have element-only content. This whitespace is removed regardless of
1409
      any <code>xml:space</code> attributes in the source document.</p>
1410

    
1411
    <p>It is possible to change this default behavior in several ways.</p>
1412
    <ul>
1413
      <li>
1414
        <p>From the <code>com.saxonica.Query</code> or <code>com.saxonica.Transform</code> command
1415
          line, options are available: <code>-strip:all</code> strips all whitespace text nodes,
1416
            <code>-strip:none</code> strips no whitespace text nodes, and
1417
            <code>-strip:ignorable</code> strips ignorable whitespace text nodes only (this is the
1418
          default).</p>
1419
      </li>
1420
      <li>
1421
        <p>If the <code>-p</code> option is used on the command line, then query parameters are
1422
          recognized in the URI passed to the <code>document()</code> or <code>doc()</code>
1423
          function. The parameter <code>strip-space=yes</code> strips all whitespace text nodes,
1424
            <code>strip-space=no</code> strips no whitespace text nodes, and
1425
            <code>strip-space=ignorable</code> strips ignorable whitespace text nodes only. This
1426
          overrides anything specified on the command line.</p>
1427
      </li>
1428
      <li>
1429
        <p>Options corresponding to the above can also be set on the <code>TransformerFactory</code>
1430
          object or on the <a class="javalink" href="net.sf.saxon.Configuration">Configuration</a>.
1431
          These settings are global.</p>
1432
      </li>
1433
    </ul>
1434

    
1435
    <p>Whitespace stripping that is specified in any of the above ways does not occur only if the
1436
      source document is parsed under Saxon's control: that is, if it is supplied as a JAXP
1437
        <code>StreamSource</code> or <code>SAXSource</code>. It also applies where the input is
1438
      supplied in the form of a tree (for example, a DOM). In this case Saxon wraps the supplied
1439
      tree in a virtual tree that provides a view of the original tree with whitespace text nodes
1440
      omitted.</p>
1441

    
1442
    <p>This whitespace stripping is additional (and prior) to any stripping carried out as a result
1443
      of the <code>xsl:strip-space</code> declaration in the stylesheet.</p>
1444
    
1445
    <p>Saxon never modifies a supplied tree <i>in situ</i>: if a tree is supplied as input, and the stylesheet
1446
      requests space stripping, then a virtual tree is created and whitespace is stripped on the fly as
1447
      it is navigated. This is expensive (it can add 25% to processing time); it is therefore best to
1448
      supply a <code>SAXSource</code> or <code>StreamSource</code> as input to a transformation, so
1449
      that Saxon can strip unwanted whitespace while the tree is being parsed and built.
1450
    </p>
1451
  </section>
1452
  <section id="streaming" title="Streaming of Large Documents">
1453
    <h1>Streaming of Large Documents</h1>
1454

    
1455
    <aside>Streaming is available only in Saxon-EE.</aside>
1456

    
1457
    <p>Sometimes source documents are too large to hold in memory. Saxon-EE provides a range of
1458
      facilities for processing such documents in <i>streaming mode</i>: that is, processing data as
1459
      it is read by the XML parser, without building a complete tree representation of the document
1460
      in memory.</p>
1461

    
1462
    <p>These facilities are closely aligned with the XSLT 3.0 Recommendation. Some facilities
1463
      are specific to Saxon, and a few facilities are also available in XQuery.</p>
1464

    
1465
    <p>Inevitably there are things that cannot be done in streaming mode - sorting is an obvious
1466
      example. Sometimes, achieving a streaming transformation means rethinking the design of how it
1467
      works - for example, splitting it into multiple phases. So streaming is rarely a case of
1468
      simply taking your existing code and setting a simple switch to request streamed
1469
      implementation.</p>
1470

    
1471
    <p>For more information, see the following sections:</p>
1472

    
1473
    <nav>
1474
      <ul/>
1475
    </nav>
1476

    
1477
    <section id="xslt-streaming" title="Streaming using XSLT 3.0">
1478
      <h1>Streaming using XSLT 3.0</h1>
1479

    
1480
      <aside>Requires Saxon-EE.</aside>
1481

    
1482
      <p>Saxon-EE (from Saxon 9.8) is fully conformant to the final XSLT 3.0 recommendation in terms of the
1483
        streaming facilities it supports. A few gaps in coverage that were found after release were fixed for Saxon 9.9. 
1484
        There are also some extensions.</p>
1485

    
1486
      <p>There are two main ways to initiate a streaming transformation:</p>
1487

    
1488
      <ol>
1489
        <li><p>Using the <a class="bodylink code" href="/xsl-elements/source-document">xsl:source-document</a>
1490
          instruction, with the attribute <code>streamable="yes"</code>. 
1491
          Here the source document is identified within the stylesheet itself.
1492
          Typically such a stylesheet will have a named template as its entry point, and will not
1493
          have any principal source document supplied externally.</p></li>
1494
        <li><p>By supplying a source document as input to a stylesheet whose initial mode is declared
1495
          with <code>streamable="yes"</code> in an <a href="/xsl-elements/mode"
1496
            class="bodylink code">xsl:mode</a> declaration. In this case the source document must be
1497
          supplied as a <code>StreamSource</code> or <code>SAXSource</code>, and not as an in-memory
1498
          tree. The details depend on which API is being used:</p>
1499
          <ul>
1500
            <li><p>With the Java s9api API, compile the stylesheet to create an <code>XsltExecutable</code>,
1501
            and then use the <code>load30</code> method to create an <code>Xslt30Transformer</code>.
1502
            Invoke the streamed transformation using the <code>applyTemplates</code> method of
1503
            the <code>Xslt30Transformer</code>, supplying the input as a <code>StreamSource</code>
1504
            or <code>SAXSource</code>.</p></li> 
1505
            <li><p>Similarly with the Saxon.Api interface on .NET, use the method
1506
            <code>Xslt30Transformer.ApplyTemplates()</code>, supplying a <code>Stream</code> 
1507
            as input.</p></li>
1508
            <li><p>With the JAXP API, start by instantiating a <code>com.saxonica.config.StreamingTransformerFactory</code>.
1509
            Invoke the transformation in the usual way by creating a <code>Transformer</code> (optionally via a
1510
            <code>Templates</code> object). When the <code>transform()</code> method is called with a
1511
            <code>StreamSource</code> or <code>SAXSource</code> as input, and when the initial mode
1512
            is a streamable mode, the input will be streamed. In consequence, this approach breaks the
1513
            normal JAXP convention whereby the document supplied as the <code>Source</code> argument to
1514
            the <code>transform()</code> method also becomes the global context item (the value of "." when
1515
            accessed within the initializer of a global variable). Instead such a reference fails with 
1516
            an XPDY0002 dynamic error.</p>
1517
            <p>The <code>StreamingTransformerFactory</code> can also be used to create an <code>XMLFilter</code>
1518
            which takes streamed input and produces streamed output, and a pipeline can be built from a
1519
            sequence of such filters connected end-to-end in the usual JAXP way.</p></li>
1520
          </ul>
1521
        </li>
1522
      </ol>
1523

    
1524
      <p>The <a class="bodylink code" href="/functions/saxon/stream">saxon:stream</a> extension
1525
        function used in previous releases is still supported for the time being. In Saxon 9.8 and later a
1526
        call on <code>saxon:stream</code> is translated at compile time into a call on the XSLT 3.0
1527
          <code>&lt;xsl:source-document&gt;</code> instruction. The original Saxon mechanism for streaming,
1528
        namely the <code>saxon:read-once</code> attribute on <code>xsl:copy-of</code>, was dropped
1529
        in Saxon 9.6.</p>
1530

    
1531
      <p>The rules for whether a construct is streamable or not are largely the same in Saxon as in
1532
        the XSLT 3.0 specification. Saxon applies these rules after doing any optimization
1533
        re-writes, so some constructs end up being streamable in Saxon even though they are not
1534
        guaranteed streamable in the W3C spec, because the Saxon optimizer rewrites the expression
1535
        into a streamable form. An example of this effect is where variables or functions are
1536
        inlined before doing the streamability analysis. In contrast, when streaming is requested,
1537
        the optimizer takes care to avoid rewriting streamable constructs into a non-streamable
1538
        form.</p>
1539

    
1540
      <p>This documentation does not attempt to provide a tutorial introduction to the streaming
1541
        capabilities of XSLT 3.0. The specification itself is not easy to read, especially the
1542
        detailed rules on which constructs are deemed streamable. However, for the most part it is
1543
        not necessary to be familiar with the detailed rules. The main things to remember are:</p>
1544

    
1545
      <ul>
1546
        <li>A construct is "consuming" if it reads a subtree of the source document, that is, if it
1547
          makes a downwards selection from the context item. In general, constructs are not allowed
1548
          to have two operands that are both consuming. Some exceptions to this are: the <a
1549
            class="bodylink code" href="/xsl-elements/fork">xsl:fork</a> instruction; conditional
1550
          expressions such as <a class="bodylink code" href="/xsl-elements/choose">xsl:choose</a> if
1551
          each branch only contains one consuming expression; the map expression
1552
            <code>map{...}</code> in XPath and the <a class="bodylink code" href="/xsl-elements/map"
1553
            >xsl:map</a> instruction in XSLT.</li>
1554
        <li>During a streaming pass, the XSLT processor remembers the ancestors of the context item
1555
          and all the attributes of ancestors. Path expressions that access the ancestors and their
1556
          attributes are therefore allowed. However, such expressions should generally return atomic
1557
          values (for example the values of attributes) rather than returning nodes in the streamed
1558
          document, because if nodes are returned, the system often can't be sure that there is no
1559
          disallowed navigation from those nodes (for example, you can't get all the descendants of
1560
          an ancestor node).</li>
1561
        <li>It's not permitted to bind a streamed node to a variable or parameter, or to pass it to
1562
          a function.</li>
1563
        <li>An expression such as <code>//section</code> is referred to as a crawling expression.
1564
          Crawling expressions potentially contain nodes which overlap each other, which creates
1565
          problems if you want to make further downward selections from such nodes. The XSLT 3.0
1566
          specification allows this in some circumstances, for example you can pass such an
1567
          expression to a function that atomizes the result, but other cases (for example, using
1568
          such an expression in <a class="bodylink code" href="/xsl-elements/for-each"
1569
            >xsl:for-each</a> or <a class="bodylink code" href="/xsl-elements/apply-templates"
1570
            >xsl:apply-templates</a>) are forbidden. If you know that the expression will never
1571
          select overlapping nodes (for example, if you know that <code>//title</code> will never
1572
          select one title appearing within another title), then you can rewrite the expression as
1573
            <code>outermost(//title)</code> to avoid the restrictions. Saxon also allows overlapping
1574
          nodes in some contexts where the W3C specification does not, provided streamability
1575
          extensions are enabled.</li>
1576
        <li>When you hit these restrictions, you can often work around them by making a copy of a
1577
          subtree of the streamed document, for example by using the new <a class="bodylink code"
1578
            href="/functions/fn/copy-of">copy-of()</a> or <a class="bodylink code"
1579
            href="/functions/fn/snapshot">snapshot()</a> functions. These are consuming expressions,
1580
          but the result is "grounded" (that is, an ordinary in-memory tree) so it can be used
1581
          without any restrictions. Clearly this only works if the subtrees that you copy are small
1582
          enough to fit in memory.</li>
1583
      </ul>
1584

    
1585
      <p>The XSLT 3.0 constructs most relevant to streaming are:</p>
1586

    
1587
      <ul>
1588
        <li><strong>Streamable template rules</strong>. XSLT 3.0 has a new <a class="bodylink code"
1589
            href="/xsl-elements/mode">xsl:mode</a> declaration, and this allows all the template
1590
          rules in a particular mode to be declared streamable (<code>&lt;xsl:mode
1591
            streamable="yes"/&gt;</code>). If a mode is declared streamable, then Saxon checks
1592
          whether all the template rules in that mode are actually streamable, and reports a
1593
          compile-time error if not.</li>
1594
        <li>The <a class="bodylink code"
1595
          href="/xsl-elements/source-document">xsl:source-document</a> instruction.
1596
          This has an <code>href</code> attribute which defines the URI of a streamed input
1597
          document, and the instructions within <code>xsl:source-document</code> are evaluated with this
1598
          document as the context node. When streamed processing is requested using the attribute
1599
          <code>streamable="yes"</code>, the body of the <code>xsl:source-document</code> instruction must
1600
          satisfy the streamability rules; again, any violation is detected at compile time.</li>
1601
        <li>The <a class="bodylink code" href="/xsl-elements/iterate">xsl:iterate</a> instruction.
1602
          This is like an <a class="bodylink code" href="/xsl-elements/for-each">xsl:for-each</a>
1603
          instruction except that it guarantees to process the selected nodes in order, and the
1604
          results of processing one node can be passed as a parameter to the next iteration, so the
1605
          action applied to one node can influence the way in which subsequent nodes are processed.
1606
          This often provides a solution to the problem that when streaming, you can never "look
1607
          backwards" at preceding nodes. Instead of looking backwards, the information that will be
1608
          needed when processing subsequent nodes can be retained in parameters and "passed
1609
          forwards". Note that streamed nodes themselves cannot be contained in parameters, but data
1610
          derived from those nodes (or copies made using the <code>copy-of()</code> function) can.</li>
1611
        <li>The <a class="bodylink code" href="/xsl-elements/merge">xsl:merge</a> instruction allows
1612
          several input sequences to be merged, based on the value of a sort key. Any or all of the
1613
          input sequences can be streamed documents, provided that they are already correctly sorted
1614
          on the sort key value.</li>
1615
        <li><strong>Accumulators</strong> allow values to be computed "in the background" while a
1616
          streamed document is being read; the final value of the <a class="bodylink code"
1617
            href="/xsl-elements/accumulator">accumulator</a> is available by calling the <a
1618
            class="bodylink code" href="/functions/fn/accumulator-after">accumulator-after()</a>
1619
          function at the end of processing, and intermediate values are also available.
1620
          Accumulators are useful if you want to compute several values during a single processing
1621
          pass of a streamed document (for example, a minimum and maximum of some value). When the
1622
          information to be maintained in the accumulator is complex, it can be useful to hold it in
1623
          a map, which is a new data structure introduced in XSLT 3.0.</li>
1624
        <li>Saxon (from 9.9) supports an additional capability: <em>capturing accumulators</em>.
1625
         By adding the attribute <code>saxon:capture="yes"</code> to an accumulator rule with
1626
          <code>phase="end"</code>, you can tell Saxon to make a snapshot copy of the matched
1627
          element (as if by calling the <code>fn:snapshot</code> function) and the code for computing
1628
          the next value of the accumulator then has full access to this snapshot, which means it is
1629
          no longer constrained to be motionless. You can even keep the snapshot copy directly
1630
          as the value of the accumulator (just write <code>select="."</code>), or you can retain
1631
          all the matched elements (write <code>select="($value, .)"</code>). One way of writing a
1632
          streamed transformation is now to capture all the data you need in accumulators, and
1633
          to process it only when you hit the end of the document.
1634
        </li>
1635
        <li>The <a class="bodylink code" href="/xsl-elements/fork">xsl:fork</a> instruction
1636
          effectively computes several instructions in parallel. In the Saxon implementation, they
1637
          are not actually evaluated in different threads, but they are all executed during a single
1638
          scan of the streamed input document. The outputs produced by each "prong" of the
1639
            <code>xsl:fork</code> instruction are buffered in memory until all prongs have
1640
          completed, and are then assembled in the correct order to form the final result.</li>
1641
        <li><strong>Streamed grouping</strong> is possible using the <a class="bodylink code"
1642
            href="/xsl-elements/for-each-group">xsl:for-each-group</a> instruction, provided that
1643
          one of the options <code>group-adjacent</code>, <code>group-starting-with</code>, or
1644
            <code>group-ending-with</code> is used. There are restrictions on the use of the <a
1645
            class="bodylink code" href="/functions/fn/current-group">current-group()</a> function
1646
          within such an instruction: essentially, it can only be used once, because it is a
1647
          consuming construct.</li>
1648
      </ul>
1649

    
1650

    
1651
      <p>All these facilities are available in Saxon-EE only.</p>
1652

    
1653
    </section>
1654

    
1655
    <section id="streamed-query" title="Streaming in XQuery">
1656
      <h1>Streaming in XQuery</h1>
1657

    
1658
      <aside>Requires Saxon-EE.</aside>
1659

    
1660
      <p>The XQuery specification says nothing on the subject of streamed evaluation; it is left
1661
        entirely to implementations. Saxon-EE supports streaming of XQuery for simple queries, using
1662
        rules similar to those that apply to XSLT.</p>
1663

    
1664
      <p>Simple queries can be streamed by specifying <code>-stream:on</code> on the Saxon-EE
1665
        command line. There is no need to specify anything in the query itself; however, the <a
1666
          class="bodylink code" href="/functions/fn/copy-of">copy-of()</a> and <a
1667
          class="bodylink code" href="/functions/fn/snapshot">snapshot()</a> functions (defined in
1668
        the XSLT 3.0 specification) may be used if streaming is not otherwise possible.</p>
1669

    
1670
      <p>When running a query using the s9api interface, streaming must be requested both when
1671
        compiling the query (<a class="javalink" href="net.sf.saxon.s9api.XQueryCompiler"
1672
          >XQueryCompiler.setStreaming(true)</a>), and when executing it (<a class="javalink"
1673
          href="net.sf.saxon.s9api.XQueryEvaluator">XQueryEvaluator.runStreamed(Source,
1674
          Destination)</a>).</p>
1675

    
1676
      <p>The query should access the streamed input document via the context item, not via the <a
1677
          class="bodylink code" href="/functions/fn/doc">doc()</a> or <a class="bodylink code"
1678
          href="/functions/fn/collection">collection()</a> function, nor using external variables.
1679
        The source document should be supplied in the form of a <code>SAXSource</code> or
1680
          <code>StreamSource</code> object.</p>
1681

    
1682
      <p>If the query is not streamable, this will be reported as a compile-time error.</p>
1683

    
1684
      <p>The conditions for streamability are essentially the same as the rules for the body of the
1685
        <a class="bodylink code" href="/xsl-elements/source-document">xsl:source-document</a>
1686
        instruction when streamed processing is requested using the attribute
1687
        <code>streamable="yes"</code>, as in the XSLT 3.0 specification. For example:</p>
1688

    
1689
      <ol>
1690
        <li>
1691
          <p>Path expressions must use downward selection only.</p>
1692
        </li>
1693
        <li>
1694
          <p>Predicates must be motionless, which means they can reference attributes but not child
1695
            elements of the node being filtered.</p>
1696
        </li>
1697
        <li>
1698
          <p>No construct may make two downward selections. For example, the expression <code>price
1699
              - discount</code> fails because both operands use the child axis to select downwards.
1700
            If necessary, use <a class="bodylink code" href="/functions/fn/copy-of">copy-of()</a> to
1701
            copy a subtree, after which arbitrary selections within the copied subtree become
1702
            possible.</p>
1703
        </li>
1704
        <li>
1705
          <p>A streamed node may not be bound to a variable. This rules out many uses of FLWOR
1706
            expressions.</p>
1707
        </li>
1708
        <li>
1709
          <p>A streamed node must not be passed as an argument to a function call, other than
1710
            built-in function calls.</p>
1711
        </li>
1712
        <li>
1713
          <p>Global variables in the query must not reference the context item.</p>
1714
        </li>
1715
      </ol>
1716

    
1717
      <p>As with XSLT, these restrictions can often be overcome by using the <a
1718
          class="bodylink code" href="/functions/fn/copy-of">copy-of()</a> or <a
1719
          class="bodylink code" href="/functions/fn/snapshot">snapshot()</a> functions, which Saxon
1720
        makes available in XQuery as well as XSLT.</p>
1721

    
1722
    </section>
1723

    
1724
    <section id="configuration-streaming" title="Configuration options for streaming">
1725
      <h1>Configuration options for streaming</h1>
1726

    
1727
      <aside>Requires Saxon-EE.</aside>
1728

    
1729
      <p>Saxon attempts streamed evaluation only if it is explicitly requested. Streaming may be
1730
        requested in a number of ways:</p>
1731

    
1732
      <ul>
1733
        <li>
1734
          <p>By use of XSLT 3.0 language constructs that request streaming, for example the <a
1735
            class="bodylink code" href="/xsl-elements/source-document">xsl:source-document</a>
1736
            instruction with attribute <code>streamable="yes"</code>, or by
1737
            specifying <code>streamable="yes"</code> on <a class="bodylink code"
1738
              href="/xsl-elements/mode"> xsl:mode</a> or <a class="bodylink code"
1739
              href="/xsl-elements/accumulator">xsl:accumulator</a>.</p>
1740
        </li>
1741
        <li>
1742
          <p>By use of a Saxon extension that requests streaming, for example <a
1743
              class="bodylink code" href="/functions/saxon/stream">saxon:stream</a>.</p>
1744
        </li>
1745
        <li>
1746
          <p>By setting the option <code>-stream:on</code> in the XQuery command line, or the
1747
            equivalent API option (for example, in s9api, <a class="javalink"
1748
              href="net.sf.saxon.s9api.XQueryCompiler">XQueryCompiler.setStreaming(true)</a>).</p>
1749
        </li>
1750
      </ul>
1751

    
1752
      <p>There are three configuration options that control how these requests for streaming
1753
        are interpreted:</p>
1754
      
1755
      <ul>
1756
        <li>The configuration option <a class="bodylink code" href="/configuration/config-features"
1757
          >Feature.STREAMABILITY</a> may be set to one of the values "off" or "standard".
1758
          (Releases prior to 9.8 supported a third option, "extended".) With a licensed Saxon-EE
1759
        configuration, the default is "standard", which means that streaming will happen if it
1760
        is requested and if it is feasible. Setting the value to "off" causes Saxon to behave
1761
        as if there is no Saxon-EE license: that is, requests for streaming are effectively
1762
        ignored, and the stylesheet is executed in a non-streaming manner (which means that processing
1763
        of a large document may fail if there is insufficient memory).</li>
1764
        
1765
        <li>The configuration option <a class="bodylink code"
1766
          href="/configuration/config-features">Feature.STREAMING_FALLBACK</a> determines what
1767
          Saxon does when streaming is requested, and a construct is found that is deemed
1768
          non-streamable. This is a boolean option. If it is set to <code>true</code>, Saxon attempts a
1769
          non-streaming implementation of the relevant construct. If sufficient memory is available
1770
          for a non-streaming evaluation, this should always give the same result as a streamed
1771
          evaluation. When the option is set to <code>false</code> (the default), the presence of a 
1772
          construct that is deemed non-streamable causes a static (compile-time) error.</li>
1773
        
1774
        <li>The configuration option <a class="bodylink code"
1775
          href="/configuration/config-features">Feature.STRICT_STREAMABILITY</a>
1776
         determines how closely Saxon's streamability analysis follows the rules in the
1777
        W3C specification. This is a boolean value (with the default <code>false</code>): the value <code>true</code> requests
1778
        strict adherence to the W3C rules. In reality this option does not affect the rules
1779
        that Saxon applies, rather it affects when they are applied. By default Saxon first performs
1780
        all its usual compile-time optimizations to the expression tree, and then checks the final result
1781
        for streamability. During the optimization process Saxon takes care to avoid replacing streamable
1782
        constructs with non-streamable equivalents, but it may do the reverse. As a result, constructs
1783
        that are not streamable according to the W3C rules may become streamable after optimization.
1784
        (An example is the non-streamable expression <code>AUTHOR or EDITOR</code>, which Saxon rewrites
1785
          in the streamable form <code>exists(AUTHOR | EDITOR)</code>.)
1786
        For interoperability, the W3C specification requires processors to provide a mode of operation in
1787
        which the W3C streamability rules are enforced rigidly, and this is achieved by setting
1788
        <code>STRICT_STREAMABILITY</code> to <code>true</code>. With this setting, Saxon checks the
1789
        expression tree for streamability <em>before</em> doing any optimizations that change
1790
        the tree.</li>
1791
      </ul>
1792
        
1793
 
1794

    
1795
      <p>When running from the command line these options can be set for example as
1796
          <code>--streamability:off</code> or <code>--streamingFallback:on</code>.</p>
1797
    </section>
1798

    
1799
 
1800

    
1801
    <section id="burst-mode-streaming" title="Burst-mode streaming">
1802
      <h1>Burst-mode streaming</h1>
1803

    
1804
      <aside>Requires Saxon-EE.</aside>
1805

    
1806

    
1807
      <p>Burst-mode streaming takes a streamed document as input, and generates a sequence of small
1808
        subtrees containing the parts of the document that need to be processed. This can be
1809
        achieved using XSLT 3.0 syntax like this:</p>
1810

    
1811
      <samp><![CDATA[<xsl:source-document streamable="yes" href="employees.xml">
1812
  <xsl:apply-templates select="*/employee/copy-of(.)"/>  
1813
</xsl:source-document>
1814
]]></samp>
1815

    
1816
      <p>The code that processes an individual <code>employee</code> element does not need to be
1817
        streamable; it can use any XSLT constructs. The only constraint is that it cannot navigate
1818
        outside the <code>employee</code> element: because the <code>employee</code> element is a
1819
        copy of a subtree from the orginal document, it has no parent or siblings.</p>
1820

    
1821
      <p>Burst-mode streaming can also be applied to the principal input of the transformation. This
1822
        works if the transformation is run from the command line, and also if it is executed from a
1823
        Java or .NET API provided that the document is supplied as a streamed source object, not as
1824
        a pre-built tree (under Java, this means a <code>StreamSource</code> or
1825
          <code>SAXSource</code>). For example:</p>
1826

    
1827
      <samp><![CDATA[<xsl:mode streamable="yes"/>
1828
<xsl:template match="/">
1829
  <xsl:apply-templates select="*/employee/copy-of(.)"/>  
1830
</xsl:template>
1831
]]></samp>
1832

    
1833
      <p>The same effect can be achieved in XQuery if the document is supplied as the initial
1834
        context item, again in the form of a streamed input source. Although the functions
1835
          <code>copy-of()</code> and <code>snapshot()</code> are defined in the XSLT 3.0
1836
        specification, Saxon also makes them available in XQuery, allowing for example:</p>
1837

    
1838
      <samp><![CDATA[*/employee ! copy-of(.)/(name, address)
1839
]]></samp>
1840

    
1841
      <p>In XQuery there is no need for the query itself to indicate that streamed execution is
1842
        required; rather this can be requested from the command line using the option
1843
          <code>-stream:on</code>. </p>
1844

    
1845
      <p>The same effect can be achieved on external streamed documents using the <a
1846
          class="bodylink code" href="/functions/saxon/stream">saxon:stream</a> extension
1847
        function.</p>
1848

    
1849

    
1850

    
1851
      <h2 class="subtitle">Example: selective copying</h2>
1852

    
1853
      <p>A very simple way of using burst mode streaming is when making a selective copy of parts of
1854
        a document. For example, the following code creates an output document containing all the
1855
          <code>footnote</code> elements from the source document that have the attribute
1856
          <code>@type='endnote'</code>:</p>
1857

    
1858
      <p>
1859
        <strong>XSLT example (named document)</strong>
1860
      </p>
1861
      <samp><![CDATA[<xsl:template name="main">
1862
  <footnotes>
1863
    <xsl:source-document streamable="yes" href="thesis.xml">
1864
      <xsl:copy-of select=".//footnote[@type='endnote'])"/>
1865
    </xsl:source-document>  
1866
  </footnotes>
1867
</xsl:template>
1868
]]></samp>
1869

    
1870
      <p>
1871
        <strong>XQuery example (named document)</strong>
1872
      </p>
1873
      <samp><![CDATA[  <footnotes>{
1874
     saxon:stream(doc('thesis.xml')//footnote[@type='endnote']) 
1875
  }</footnotes>
1876
]]></samp>
1877

    
1878
      <p>
1879
        <strong>XSLT example (principal input document)</strong>
1880
      </p>
1881
      <samp><![CDATA[<xsl:mode streamable="yes"/>
1882
<xsl:template match="/">
1883
  <footnotes>
1884
    <xsl:copy-of select=".//footnote[@type='endnote'])"/>
1885
  </footnotes>
1886
</xsl:template>
1887
]]></samp>
1888

    
1889
      <p>
1890
        <strong>XQuery example (principal input document)</strong>
1891
      </p>
1892
      <samp><![CDATA[  <footnotes>{.//footnote[@type='endnote']}</footnotes>
1893
]]></samp>
1894

    
1895

    
1896
      <p>These examples work because the predicate (the expression in square brackets) is
1897
          <i>motionless</i> - evaluating the predicate does not require the source document to be
1898
        repositioned. If the predicate needs access to child elements rather than attributes, it's
1899
        necessary to make a copy of each footnote and then test the copy. The last example then
1900
        becomes:</p>
1901

    
1902
      <samp><![CDATA[  <footnotes>{.//footnote/copy-of(.)[type='endnote']}</footnotes>
1903
]]></samp>
1904
    </section>
1905

    
1906

    
1907

    
1908
    <section id="partial-reading" title="Reading source documents partially">
1909
      <h1>Reading source documents partially</h1>
1910

    
1911
      <aside>Requires Saxon-EE.</aside>
1912

    
1913

    
1914
      <p>As well as allowing a source document to be processed in a single sequential pass, the
1915
        streaming facility in many cases allows the source document to be read only partially. For
1916
        example, the following query will return true as soon as it finds a transaction with a
1917
        negative value, and will then immediately stop processing the input file:</p>
1918
      <samp><![CDATA[some $t in saxon:stream(doc('big-transaction-file.xml')//transaction)
1919
satisfies number($t/@value) lt 0
1920
]]></samp>
1921

    
1922
      <p>This facility is particularly useful for extracting data that appears near the start of a
1923
        large file. It does mean, however, that well-formedness or validity errors appearing later
1924
        in the file will not necessarily be detected.</p>
1925

    
1926
      <p>To exit early from reading a streamed document using pure XSLT 3.0 constructs, use <a
1927
          href="/xsl-elements/iterate" class="bodylink code">xsl:iterate</a> like this:</p>
1928

    
1929
      <samp><![CDATA[<xsl:variable name="contains-debit" as="xs:boolean">
1930
  <xsl:source-document streamable="yes" href="big-transaction-file.xml">
1931
    <xsl:iterate select=".//transaction">
1932
      <xsl:if test="@value lt 0">
1933
        <xsl:break select="true()"/>
1934
      </xsl:if>
1935
      <xsl:on-completion select="false()"/>
1936
    </xsl:iterate>
1937
  </xsl:source-document>
1938
</xsl:variable>
1939
]]></samp>
1940

    
1941
    </section>
1942

    
1943

    
1944

    
1945
    <section id="stream-with-iterate" title="Streaming with xsl:iterate">
1946
      <h1>Streaming with xsl:iterate</h1>
1947

    
1948
      <aside>Requires Saxon-EE.</aside>
1949

    
1950
      <p>In the examples given above, streaming is used to select a sequence of element nodes from
1951
        the source document, and each of these nodes is then processed independently. In cases where
1952
        the processing of one node depends in some way on previous nodes, it is possible to use <a
1953
          class="bodylink" href="../burst-mode-streaming">burst-mode streaming</a> in conjunction
1954
        with the new <a href="/xsl-elements/iterate" class="bodylink code">xsl:iterate</a>
1955
        instruction in XSLT 3.0.</p>
1956

    
1957
      <p>The following example takes a sequence of <code>&lt;transaction&gt;</code> elements in an
1958
        input document, each one containing the value of a debit or credit from an account. As
1959
        output it copies the transaction elements, adding a current balance.</p>
1960
      <samp><![CDATA[    <xsl:source-document streamable="yes" href="transactions.xml">          
1961
      <xsl:iterate select="account/transaction">
1962
        <xsl:param name="balance" as="xs:decimal" select="0.00"/>
1963
        <xsl:variable name="new-balance" as="xs:decimal" select="$balance + xs:decimal(@value)"/>
1964
        <transaction balance="{$new-balance}">
1965
           <xsl:copy-of select="@*"/>
1966
        </transaction>
1967
        <xsl:next-iteration>
1968
          <xsl:with-param name="balance" select="$new-balance"/>
1969
        </xsl:next-iteration>
1970
      </xsl:iterate>
1971
    </xsl:source-document>  
1972
]]></samp>
1973

    
1974
      <p>The following example is similar: this time it copies the account number (contained in a
1975
        separate element at the start of the file) into each transaction element:</p>
1976
      <samp><![CDATA[    <xsl:source-document streamable="yes" href="transactions.xml">           
1977
      <xsl:iterate select="account/(account-number|transaction)">
1978
        <xsl:param name="accountNr"/>
1979
        <xsl:choose>
1980
           <xsl:when test="self::account-number">
1981
             <xsl:next-iteration>
1982
                <xsl:with-param name="accountNr" select="string(.)"/>
1983
             </xsl:next-iteration>
1984
           </xsl:when>
1985
           <xsl:otherwise>
1986
             <transaction account-number="{$accountNr}">
1987
               <xsl:copy-of select="@*"/>
1988
             </transaction>
1989
           </xsl:otherwise>
1990
        </xsl:choose>
1991
      </xsl:iterate>
1992
    </xsl:source-document>  
1993
]]></samp>
1994

    
1995
      <p>Here is a more complex example, one that groups adjacent transaction elements having the
1996
        same date attribute. The two loop parameters are the current grouping key and the current
1997
        date. The contents of a group are accumulated in a variable until the date changes.</p>
1998
      <samp><![CDATA[    <xsl:source-document streamable="yes" href="transactions.xml">           
1999
      <xsl:iterate select="account/transaction">
2000
        <xsl:param name="group" as="element(transaction)*" select="()"/>
2001
        <xsl:param name="currentDate" as="xs:date?" select="()"/>
2002
        <xsl:choose>
2003
          <xsl:when test="xs:date(@date) eq $currentDate or empty($group)">
2004
            <xsl:next-iteration>
2005
              <xsl:with-param name="currentDate" select="@date"/>
2006
              <xsl:with-param name="group" select="($group, .)"/>
2007
            </xsl:next-iteration>
2008
          </xsl:when>
2009
          <xsl:otherwise>
2010
            <daily-transactions date="{$currentDate}">
2011
              <xsl:copy-of select="$group"/>
2012
            </daily-transactions>
2013
            <xsl:next-iteration>
2014
              <xsl:with-param name="group" select="."/>
2015
              <xsl:with-param name="currentDate" select="@date"/>
2016
            </xsl:next-iteration>            
2017
          </xsl:otherwise>
2018
        </xsl:choose>
2019
        <xsl:on-completion>
2020
          <final-daily-transactions date="{$currentDate}">
2021
            <xsl:copy-of select="$group"/>
2022
          </final-daily-transactions>
2023
        </xsl:on-completion>        
2024
      </xsl:iterate>
2025
    </xsl:source-document>  
2026
]]></samp>
2027

    
2028
      <p>Note that when an <a class="bodylink code" href="/xsl-elements/iterate">xsl:iterate</a>
2029
        loop is terminated using <a class="bodylink code" href="/xsl-elements/break">xsl:break</a>,
2030
        parsing of the source document will be abandoned. This provides a convenient way to read
2031
        data near the start of a large file without incurring the cost of reading the entire
2032
        file.</p>
2033
    </section>
2034

    
2035
    <section id="stream-with-merge" title="Streaming with xsl:merge">
2036
      <h1>Streaming with xsl:merge</h1>
2037

    
2038
      <aside>Requires Saxon-EE.</aside>
2039

    
2040
      <p>Saxon (since 9.6) allows several streamed inputs to be merged using the new XSLT 3.0 <a
2041
          href="/xsl-elements/merge" class="bodylink code">xsl:merge</a> instruction. For this to
2042
        work, there are a number of rules to follow:</p>
2043

    
2044
      <ol>
2045
        <li>
2046
          <p>Streaming must be requested by specifying <code>streamable="yes"</code> on the <a
2047
              class="bodylink code" href="/xsl-elements/merge-source">xsl:merge-source</a>
2048
            element.</p>
2049
        </li>
2050
        <li>
2051
          <p>When streaming is requested, the <code>for-each-source</code> attribute of
2052
              <code>xsl:merge-source</code> must be present, and must be a single string.</p>
2053
        </li>
2054
        <li>
2055
          <p>The <code>select</code> attribute on the <code>xsl:merge-source</code> element must
2056
            take the form of a motionless pattern.</p>
2057
        </li>
2058
      </ol>
2059

    
2060
      <p>For each node selected by the <code>select</code> expression, Saxon takes an implicit
2061
        snapshot (in the sense of the XSLT 3.0 <a class="bodylink code"
2062
          href="/functions/fn/snapshot">fn:snapshot()</a> function). The merge keys are evaluated in
2063
        relation to this snapshot, and it is this snapshot that is presented within the
2064
          <code>xsl:merge-action</code> construct as the result of the <a class="bodylink code"
2065
          href="/functions/fn/current-merge-group">fn:current-merge-group()</a> function.</p>
2066

    
2067
      <p>Here is an example of streamed merging of two log files:</p>
2068

    
2069
      <samp><![CDATA[<xsl:merge>
2070
  <xsl:merge-source streamable="yes"
2071
       for-each-source="'log-file-1.xml'" select="events/event">
2072
    <xsl:merge-key select="xs:dateTime(@timestamp)"/>
2073
  </xsl:merge-source>
2074
  <xsl:merge-source streamable="yes"
2075
       for-each-source="'log-file-2.xml'" select="log/day/record">
2076
    <xsl:merge-key select="dateTime(../@date, time)"/>
2077
  </xsl:merge-source>
2078
  <xsl:merge-action>
2079
    <group>
2080
      <xsl:copy-of select="current-merge-group()" />
2081
    </group>
2082
  </xsl:merge-action>
2083
</xsl:merge>]]></samp>
2084
    </section>
2085

    
2086

    
2087
    <section id="streaming-templates" title="Streaming Templates">
2088
      <h1>Streaming Templates</h1>
2089

    
2090
      <aside>Requires Saxon-EE.</aside>
2091

    
2092
      <p>Streaming templates allow a document to be processed hierarchically in the classical XSLT
2093
        style, applying template rules to each element (or other nodes) in a top-down manner, while
2094
        scanning the source document in a pure streaming fashion, without building the source tree
2095
        in memory. Saxon-EE allows streamed processing of a document using template rules, provided
2096
        the templates conform to a set of strict guidelines.</p>
2097

    
2098
      <p>Streaming in this way is a property of a <strong>mode</strong>; a mode can be declared to
2099
        be streamable, and if it is so declared, then all template rules using that mode must obey
2100
        the rules for streamability. A mode is declared to be streamable using the top-level
2101
        stylesheet declaration:</p>
2102

    
2103
      <samp><![CDATA[<xsl:mode name="s" streamable="yes"/>]]></samp>
2104

    
2105
      <p>The <code>name</code> attribute is optional; if omitted, the declaration applies to the
2106
        default (unnamed) mode.</p>
2107

    
2108
      <p>Streamed processing of a source document can be applied either to the principal source
2109
        document of the transformation, or to a secondary source document read using the <a
2110
          class="bodylink code" href="/xsl-elements/source-document">xsl:source-document</a>
2111
        instruction.</p>
2112

    
2113
      <p>To use streaming on the principal source document, the input to the transformation must be
2114
        supplied in the form of a <code>StreamSource</code> or <code>SAXSource</code>, and the
2115
        initial mode selected on entry to the transformation must be a streamable mode. In this case
2116
        there must be no references to the context item in the initializer of any global
2117
        variable.</p>
2118

    
2119
      <p>Streamed processing of a secondary document is initiated using the instruction:</p>
2120

    
2121
      <samp><![CDATA[<xsl:source-document streamable="yes" href="abc.xml">
2122
  <xsl:apply-templates mode="s"/>
2123
</xsl:source-document>]]></samp>
2124

    
2125
      <p>Saxon will also recognize an instruction of the form:</p>
2126

    
2127
      <samp><![CDATA[<xsl:apply-templates select="doc('abc.xml')" mode="s"/>]]></samp>
2128

    
2129
      <p>Here the <code>select</code> attribute must contain a simple call on the <a
2130
          class="bodylink code" href="/functions/fn/doc">doc()</a> or <a class="bodylink code"
2131
          href="/functions/fn/document">document()</a> function, and the mode (explicit or implicit)
2132
        must be declared as streamable. The call on <code>doc()</code> or <code>document()</code>
2133
        can be extended with a streamable selection path, for example
2134
          <code>select="doc('employee.xml')/*/employee"</code>.</p>
2135

    
2136
      <p>If a mode is declared as streamable, then it must ONLY be used in streaming mode; it is not
2137
        possible to apply templates using a streaming mode if the selected nodes are ordinary
2138
        non-streamed nodes. </p>
2139

    
2140
      <p>Every template rule within a streamable mode must follow strict rules to ensure it can be
2141
        processed in a streaming manner. The essence of these rules is:</p>
2142
      <ol>
2143
        <li>
2144
          <p>The match pattern for the template rule must be a simple pattern that can be evaluated
2145
            when positioned at the start tag of an element, without repositioning the stream (but
2146
            information about the ancestors of the element and their attributes is available,
2147
            together with some limited information about their position relative to their siblings).
2148
            Examples of acceptable patterns are <code>*</code>, <code>para</code>,
2149
              <code>para[1]</code>, or <code>para/*</code>.</p>
2150
          <p>If the match pattern includes a boolean predicate, then the predicate must be
2151
            "motionless", which means that it can be evaluated while the input stream is positioned
2152
            at the start tag. This means it can reference properties such as <code>name()</code> and
2153
              <code>base-uri()</code>, and can reference attributes of the element, but cannot
2154
            reference its children or content.</p>
2155
          <p>If the match pattern includes a numeric predicate, then it must be possible to evaluate
2156
            this by counting either the total number of preceding-sibling elements, or the number of
2157
            preceding siblings with a given name. Examples of permitted patterns include
2158
              <code>*[1]</code>, <code>p[3]</code>, and <code>*:p[2][@class='bold']</code>;
2159
            disallowed patterns include <code>(descendant::fig)[1]</code>,
2160
              <code>p[@class='bold'][2]</code>, and <code>p[last()]</code>.</p>
2161
        </li>
2162
        <li>
2163
          <p> The body of the template rule must contain at most one expression or instruction that
2164
            reads the contents below the matched element (that is, children or descendants), and it
2165
            must process the contents in document order. This expression or instruction will often
2166
            be one of the following:</p>
2167
          <ul>
2168
            <li>
2169
              <p>
2170
                <code>&lt;xsl:apply-templates/&gt;</code>
2171
              </p>
2172
            </li>
2173
            <li>
2174
              <p>
2175
                <code>&lt;xsl:value-of select="."/&gt;</code>
2176
              </p>
2177
            </li>
2178
            <li>
2179
              <p>
2180
                <code>&lt;xsl:copy-of select="."/&gt;</code>
2181
              </p>
2182
            </li>
2183
            <li>
2184
              <p>
2185
                <code>string(.)</code>
2186
              </p>
2187
            </li>
2188
            <li>
2189
              <p><code>data(.)</code> (explicitly or implicitly)</p>
2190
            </li>
2191
          </ul>
2192
          <p>but this list is not exhaustive. It is possible to process the contents selectively by
2193
            using a streamable path expression, for example:</p>
2194
          <ul>
2195
            <li>
2196
              <p>
2197
                <code>&lt;xsl:apply-templates select="foo"/&gt;</code>
2198
              </p>
2199
            </li>
2200
            <li>
2201
              <p>
2202
                <code>&lt;xsl:value-of select="a/b/c"/&gt;</code>
2203
              </p>
2204
            </li>
2205
            <li>
2206
              <p>
2207
                <code>&lt;xsl:copy-of select="x/y"/&gt;</code>
2208
              </p>
2209
            </li>
2210
          </ul>
2211
          <p>but this effectively means that the content not selected by this path is skipped
2212
            entirely; the transformation ignores it.</p>
2213
          <p>The template can access attributes of the context item without restriction, as well as
2214
            properties such as its <code>name()</code>, <code>local-name()</code>, and
2215
              <code>base-uri()</code>. It can also access the ancestors of the context item, the
2216
            attributes of the ancestors, and properties such as the name of an ancestor; but having
2217
            navigated to an ancestor, it cannot then navigate downwards or sideways, since the
2218
            siblings and the other descendants of the ancestor are not available while
2219
            streaming.</p>
2220
          <p>The restriction that only one downwards access is allowed makes it an error to use an
2221
            expression such as <code>price - discount</code> in a streamable template. This problem
2222
            can often be circumvented by making a copy of the context item. This can be done using
2223
            the <code>copy-of()</code> function: for example <code>&lt;xsl:value-of
2224
              select="copy-of(.)/(price - discount)"/&gt;</code>. Taking a copy of the context node
2225
            requires memory, of course, and should be avoided unless the contents of the node are
2226
            small.</p>
2227

    
2228
          <p>Certain constructs using positional filters can be evaluated in streaming mode. For
2229
            example, it is possible to use <code>&lt;xsl:apply-templates select="*[1]"/&gt;</code>.
2230
            The filter must be on a node test that uses the child axis and selects element nodes.
2231
            The forms accepted are expressions that can be expressed as <code>x[position() op
2232
              N]</code> where <code>N</code> is an expression that is independent of the focus and
2233
            is statically known to evaluate to a number, <code>x</code> is a node test using the
2234
            child axis, and <code>op</code> is one of the operators <code>eq</code>,
2235
            <code>le</code>, <code>lt</code>, <code>gt</code>, or <code>ge</code>. Alternative forms
2236
            of this construct such as <code>x[N]</code>, <code>remove(x, 1)</code>,
2237
              <code>head(x)</code>, <code>tail(x)</code>, and <code>subsequence(x, 1, N)</code> are
2238
            also accepted.</p>
2239
        </li>
2240
      </ol>
2241

    
2242
    </section>
2243
  </section>
2244
  <section id="projection" title="Document Projection">
2245
    <h1>Document Projection</h1>
2246

    
2247
    <aside>Document projection is available only in Saxon-EE.</aside>
2248

    
2249

    
2250
    <p>Document Projection is a mechanism that analyzes a query to determine what parts of a
2251
      document it can potentially access, and then while building a tree to represent the document,
2252
      leaves out those parts of the tree that cannot make any difference to the result of the
2253
      query.</p>
2254

    
2255
    <p>Document projection can be enabled as an option on the XQuery command line interface: set
2256
        <code>-projection:on</code>. It is only used if requested. The command line option affects
2257
      both the primary source document supplied on the command line, and any calls on the
2258
        <code>doc()</code> function within the body of the query that use a literal string argument
2259
      for the document URI.</p>
2260

    
2261
    <p>For feedback on the impact of document projection in terms of reducing the size of the source
2262
      document in memory, use the <code>-t</code> option on the command line, which shows for each
2263
      document loaded how many nodes from the input document were retained and how many
2264
      discarded.</p>
2265

    
2266
    <p>From the s9api API, document projection can be invoked as an option on the <a
2267
        class="javalink" href="net.sf.saxon.s9api.DocumentBuilder">DocumentBuilder</a>. The call
2268
        <code>setDocumentProjectionQuery()</code> supplies as its argument a compiled query (an
2269
        <code>XQueryExecutable</code>), and the document built by the document builder is then
2270
      projected to retain only the parts of the document that are accessed by this query, when it
2271
      operates on this document as the initial context item. For example, if the supplied query is
2272
        <code>count(//ITEM)</code>, then only the <code>ITEM</code> elements will be retained.</p>
2273

    
2274
    <p>It is also possible to request that a query should perform document projection on documents
2275
      that it reads using the <code>doc()</code> function, provided this has a string-literal
2276
      argument. This can be requested using the option <code>setAllowDocumentProjection(true)</code>
2277
      on the <code>XQueryExpression</code> object. This is not available directly in the s9api
2278
      interface, but the <code>XQueryExpression</code> is reachable from the
2279
        <code>XQueryExecutable</code> using the accessor method
2280
        <code>getUnderlyingCompiledQuery()</code>.</p>
2281
    <aside>It is best to avoid supplying a query that actually returns nodes from the document
2282
      supplied as the context item, since the analysis cannot know what the invoker of the query
2283
      will want to do with these nodes. For example, the query
2284
        <code>&lt;out&gt;{//ITEM}&lt;/out&gt;</code> works better than <code>//ITEM</code>, since it
2285
      is clear that all descendants of the <code>ITEM</code> elements must be retained, but not
2286
      their ancestors. If the supplied query selects nodes from the input document, then Saxon
2287
      assumes that the application will need access to the entire subtree rooted at these nodes, but
2288
      that it will not attempt to navigate upwards or outwards from these nodes. On the other hand,
2289
      nodes that are atomized (for example in a filter) will be retained without their descendants,
2290
      except as needed to compute the filter.</aside>
2291

    
2292
    <p>The more complex the query, the less likely it is that Saxon will be able to analyze it to
2293
      determine the subset of the document required. If precise analysis is not possible, document
2294
      projection has no effect. Currently Saxon makes no attempt to analyze accesses made within
2295
      user-defined functions. Also, of course, Saxon cannot analyze the expectations of external
2296
      (Java) functions called from the query.</p>
2297

    
2298
    <p>Document projection is supported only for XQuery, and it works only when a document
2299
      is parsed and loaded for the purpose of executing a single query. It is possible, however, to
2300
      use the mechanism to create a manual filter for source documents if the required subset of the
2301
      document is known. To achieve this, create a query that selects the required parts of the
2302
      document supplied as the context item, and compile it to a s9api
2303
      <code>XQueryExecutable</code>. The query does not have to do anything useful: the only
2304
      requirement is that the result of the query on the subset document must be the same as the
2305
      result on the original document. Then supply this <code>XQueryExecutable</code> to the s9api
2306
        <code>DocumentBuilder</code> used to build the document.</p>
2307

    
2308
    <p>Of course, when document projection is used manually like this then it is entirely a user
2309
      responsibility to ensure that the selected part of the document contains all the nodes
2310
      required.</p>
2311
  </section>
2312
  <section id="w3c-dtds" title="References to W3C DTDs">
2313
    <h1>References to W3C DTDs</h1>
2314

    
2315

    
2316

    
2317
    <p>During 2010-11, W3C took steps to reduce the burden of meeting requests for
2318
      commonly-referenced documents such as the DTD for XHTML. The W3C web server routinely
2319
      adds an artificial 30-second time delay for such requests. In response to this, Saxon now includes
2320
      copies of these documents within the issued JAR file, and recognizes requests for these
2321
      documents, satisfying the request using the local copy.</p>
2322

    
2323
    <p>This is done only in cases where Saxon itself instantiates the XML parser. In cases where the
2324
      user application instantiates an XML parser, the same effect can be achieved by setting the <a
2325
        class="javalink" href="net.sf.saxon.lib.StandardEntityResolver">StandardEntityResolver</a>
2326
      as a property of the <code>XMLReader</code> (parser).</p>
2327

    
2328
    <p>The documents recognized by the <code>StandardEntityResolver</code> are:</p>
2329

    
2330
    <table>
2331
      <thead>
2332
        <tr>
2333
          <td>
2334
            <p>Public ID</p>
2335
          </td>
2336
          <td>
2337
            <p>System ID</p>
2338
          </td>
2339
          <td>
2340
            <p>Saxon resource name</p>
2341
          </td>
2342
        </tr>
2343
      </thead>
2344
      <tbody>
2345
        <tr>
2346
          <td>
2347
            <p>-//W3C//ENTITIES Latin 1 for XHTML//EN</p>
2348
          </td>
2349
          <td>
2350
            <p>http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent</p>
2351
          </td>
2352
          <td>
2353
            <p>w3c/xhtml-lat1.ent</p>
2354
          </td>
2355
        </tr>
2356
        <tr>
2357
          <td>
2358
            <p>-//W3C//ENTITIES Symbols for XHTML//EN</p>
2359
          </td>
2360
          <td>
2361
            <p>http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent</p>
2362
          </td>
2363
          <td>
2364
            <p>w3c/xhtml-symbol.ent</p>
2365
          </td>
2366
        </tr>
2367
        <tr>
2368
          <td>
2369
            <p>-//W3C//ENTITIES Special for XHTML//EN</p>
2370
          </td>
2371
          <td>
2372
            <p>http://www.w3.org/TR/xhtml1/DTD/xhtml-special.ent</p>
2373
          </td>
2374
          <td>
2375
            <p>w3c/xhtml-special.ent</p>
2376
          </td>
2377
        </tr>
2378
        <tr>
2379
          <td>
2380
            <p>-//W3C//DTD XHTML 1.0 Transitional//EN</p>
2381
          </td>
2382
          <td>
2383
            <p>http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd</p>
2384
          </td>
2385
          <td>
2386
            <p>w3c/xhtml10/xhtml1-transitional.dtd</p>
2387
          </td>
2388
        </tr>
2389
        <tr>
2390
          <td>
2391
            <p>-//W3C//DTD XHTML 1.0 Strict//EN</p>
2392
          </td>
2393
          <td>
2394
            <p>http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd</p>
2395
          </td>
2396
          <td>
2397
            <p>w3c/xhtml10/xhtml1-strict.dtd</p>
2398
          </td>
2399
        </tr>
2400
        <tr>
2401
          <td>
2402
            <p>-//W3C//DTD XHTML 1.0 Frameset//EN</p>
2403
          </td>
2404
          <td>
2405
            <p>http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd</p>
2406
          </td>
2407
          <td>
2408
            <p>w3c/xhtml10/xhtml1-frameset.dtd</p>
2409
          </td>
2410
        </tr>
2411
        <tr>
2412
          <td>
2413
            <p>-//W3C//DTD XHTML Basic 1.0//EN</p>
2414
          </td>
2415
          <td>
2416
            <p>http://www.w3.org/TR/xhtml-basic/xhtml-basic10.dtd</p>
2417
          </td>
2418
          <td>
2419
            <p>w3c/xhtml10/xhtml-basic10.dtd</p>
2420
          </td>
2421
        </tr>
2422
        <tr>
2423
          <td>
2424
            <p>-//W3C//DTD XHTML 1.1//EN</p>
2425
          </td>
2426
          <td>
2427
            <p>http://www.w3.org/MarkUp/DTD/xhtml11.dtd</p>
2428
          </td>
2429
          <td>
2430
            <p>w3c/xhtml11/xhtml11.dtd</p>
2431
          </td>
2432
        </tr>
2433
        <tr>
2434
          <td>
2435
            <p>-//W3C//DTD XHTML Basic 1.1//EN</p>
2436
          </td>
2437
          <td>
2438
            <p>http://www.w3.org/MarkUp/DTD/xhtml-basic11.dtd</p>
2439
          </td>
2440
          <td>
2441
            <p>w3c/xhtml11/xhtml-basic11.dtd</p>
2442
          </td>
2443
        </tr>
2444
        <tr>
2445
          <td>
2446
            <p>-//W3C//ELEMENTS XHTML Access Element 1.0//EN</p>
2447
          </td>
2448
          <td>
2449
            <p>http://www.w3.org/MarkUp/DTD/xhtml-access-1.mod</p>
2450
          </td>
2451
          <td>
2452
            <p>w3c/xhtml11/xhtml-access-1.mod</p>
2453
          </td>
2454
        </tr>
2455
        <tr>
2456
          <td>
2457
            <p>-//W3C//ENTITIES XHTML Access Attribute Qnames 1.0//EN</p>
2458
          </td>
2459
          <td>
2460
            <p>http://www.w3.org/MarkUp/DTD/xhtml-access-qname-1.mod</p>
2461
          </td>
2462
          <td>
2463
            <p>w3c/xhtml11/xhtml-access-qname-1.mod</p>
2464
          </td>
2465
        </tr>
2466
        <tr>
2467
          <td>
2468
            <p>-//W3C//ELEMENTS XHTML Java Applets 1.0//EN</p>
2469
          </td>
2470
          <td>
2471
            <p>http://www.w3.org/MarkUp/DTD/xhtml-applet-1.mod</p>
2472
          </td>
2473
          <td>
2474
            <p>w3c/xhtml11/xhtml-applet-1.mod</p>
2475
          </td>
2476
        </tr>
2477
        <tr>
2478
          <td>
2479
            <p>-//W3C//ELEMENTS XHTML Base Architecture 1.0//EN</p>
2480
          </td>
2481
          <td>
2482
            <p>http://www.w3.org/MarkUp/DTD/xhtml-arch-1.mod</p>
2483
          </td>
2484
          <td>
2485
            <p>w3c/xhtml11/xhtml-arch-1.mod</p>
2486
          </td>
2487
        </tr>
2488
        <tr>
2489
          <td>
2490
            <p>-//W3C//ENTITIES XHTML Common Attributes 1.0//EN</p>
2491
          </td>
2492
          <td>
2493
            <p>http://www.w3.org/MarkUp/DTD/xhtml-attribs-1.mod</p>
2494
          </td>
2495
          <td>
2496
            <p>w3c/xhtml11/xhtml-attribs-1.mod</p>
2497
          </td>
2498
        </tr>
2499
        <tr>
2500
          <td>
2501
            <p>-//W3C//ELEMENTS XHTML Base Element 1.0//EN</p>
2502
          </td>
2503
          <td>
2504
            <p>http://www.w3.org/MarkUp/DTD/xhtml-base-1.mod</p>
2505
          </td>
2506
          <td>
2507
            <p>w3c/xhtml11/xhtml-base-1.mod</p>
2508
          </td>
2509
        </tr>
2510
        <tr>
2511
          <td>
2512
            <p>-//W3C//ELEMENTS XHTML Basic Forms 1.0//EN</p>
2513
          </td>
2514
          <td>
2515
            <p>http://www.w3.org/MarkUp/DTD/xhtml-basic-form-1.mod</p>
2516
          </td>
2517
          <td>
2518
            <p>w3c/xhtml11/xhtml-basic-form-1.mod</p>
2519
          </td>
2520
        </tr>
2521
        <tr>
2522
          <td>
2523
            <p>-//W3C//ELEMENTS XHTML Basic Tables 1.0//EN</p>
2524
          </td>
2525
          <td>
2526
            <p>http://www.w3.org/MarkUp/DTD/xhtml-basic-table-1.mod</p>
2527
          </td>
2528
          <td>
2529
            <p>w3c/xhtml11/xhtml-basic-table-1.mod</p>
2530
          </td>
2531
        </tr>
2532
        <tr>
2533
          <td>
2534
            <p>-//W3C//ENTITIES XHTML Basic 1.0 Document Model 1.0//EN</p>
2535
          </td>
2536
          <td>
2537
            <p>http://www.w3.org/MarkUp/DTD/xhtml-basic10-model-1.mod</p>
2538
          </td>
2539
          <td>
2540
            <p>w3c/xhtml11/xhtml-basic10-model-1.mod</p>
2541
          </td>
2542
        </tr>
2543
        <tr>
2544
          <td>
2545
            <p>-//W3C//ENTITIES XHTML Basic 1.1 Document Model 1.0//EN</p>
2546
          </td>
2547
          <td>
2548
            <p>http://www.w3.org/MarkUp/DTD/xhtml-basic11-model-1.mod</p>
2549
          </td>
2550
          <td>
2551
            <p>w3c/xhtml11/xhtml-basic11-model-1.mod</p>
2552
          </td>
2553
        </tr>
2554
        <tr>
2555
          <td>
2556
            <p>-//W3C//ELEMENTS XHTML BDO Element 1.0//EN</p>
2557
          </td>
2558
          <td>
2559
            <p>http://www.w3.org/MarkUp/DTD/xhtml-bdo-1.mod</p>
2560
          </td>
2561
          <td>
2562
            <p>w3c/xhtml11/xhtml-bdo-1.mod</p>
2563
          </td>
2564
        </tr>
2565
        <tr>
2566
          <td>
2567
            <p>-//W3C//ELEMENTS XHTML Block Phrasal 1.0//EN</p>
2568
          </td>
2569
          <td>
2570
            <p>http://www.w3.org/MarkUp/DTD/xhtml-blkphras-1.mod</p>
2571
          </td>
2572
          <td>
2573
            <p>w3c/xhtml11/xhtml-blkphras-1.mod</p>
2574
          </td>
2575
        </tr>
2576
        <tr>
2577
          <td>
2578
            <p>-//W3C//ELEMENTS XHTML Block Presentation 1.0//EN</p>
2579
          </td>
2580
          <td>
2581
            <p>http://www.w3.org/MarkUp/DTD/xhtml-blkpres-1.mod</p>
2582
          </td>
2583
          <td>
2584
            <p>w3c/xhtml11/xhtml-blkpres-1.mod</p>
2585
          </td>
2586
        </tr>
2587
        <tr>
2588
          <td>
2589
            <p>-//W3C//ELEMENTS XHTML Block Structural 1.0//EN</p>
2590
          </td>
2591
          <td>
2592
            <p>http://www.w3.org/MarkUp/DTD/xhtml-blkstruct-1.mod</p>
2593
          </td>
2594
          <td>
2595
            <p>w3c/xhtml11/xhtml-blkstruct-1.mod</p>
2596
          </td>
2597
        </tr>
2598
        <tr>
2599
          <td>
2600
            <p>-//W3C//ENTITIES XHTML Character Entities 1.0//EN</p>
2601
          </td>
2602
          <td>
2603
            <p>http://www.w3.org/MarkUp/DTD/xhtml-charent-1.mod</p>
2604
          </td>
2605
          <td>
2606
            <p>w3c/xhtml11/xhtml-charent-1.mod</p>
2607
          </td>
2608
        </tr>
2609
        <tr>
2610
          <td>
2611
            <p>-//W3C//ELEMENTS XHTML Client-side Image Maps 1.0//EN</p>
2612
          </td>
2613
          <td>
2614
            <p>http://www.w3.org/MarkUp/DTD/xhtml-csismap-1.mod</p>
2615
          </td>
2616
          <td>
2617
            <p>w3c/xhtml11/xhtml-csismap-1.mod</p>
2618
          </td>
2619
        </tr>
2620
        <tr>
2621
          <td>
2622
            <p>-//W3C//ENTITIES XHTML Datatypes 1.0//EN</p>
2623
          </td>
2624
          <td>
2625
            <p>http://www.w3.org/MarkUp/DTD/xhtml-datatypes-1.mod</p>
2626
          </td>
2627
          <td>
2628
            <p>w3c/xhtml11/xhtml-datatypes-1.mod</p>
2629
          </td>
2630
        </tr>
2631
        <tr>
2632
          <td>
2633
            <p>-//W3C//ELEMENTS XHTML Editing Markup 1.0//EN</p>
2634
          </td>
2635
          <td>
2636
            <p>http://www.w3.org/MarkUp/DTD/xhtml-edit-1.mod</p>
2637
          </td>
2638
          <td>
2639
            <p>w3c/xhtml11/xhtml-edit-1.mod</p>
2640
          </td>
2641
        </tr>
2642
        <tr>
2643
          <td>
2644
            <p>-//W3C//ENTITIES XHTML Intrinsic Events 1.0//EN</p>
2645
          </td>
2646
          <td>
2647
            <p>http://www.w3.org/MarkUp/DTD/xhtml-events-1.mod</p>
2648
          </td>
2649
          <td>
2650
            <p>w3c/xhtml11/xhtml-events-1.mod</p>
2651
          </td>
2652
        </tr>
2653
        <tr>
2654
          <td>
2655
            <p>-//W3C//ELEMENTS XHTML Forms 1.0//EN</p>
2656
          </td>
2657
          <td>
2658
            <p>http://www.w3.org/MarkUp/DTD/xhtml-form-1.mod</p>
2659
          </td>
2660
          <td>
2661
            <p>w3c/xhtml11/xhtml-form-1.mod</p>
2662
          </td>
2663
        </tr>
2664
        <tr>
2665
          <td>
2666
            <p>-//W3C//ELEMENTS XHTML Frames 1.0//EN</p>
2667
          </td>
2668
          <td>
2669
            <p>http://www.w3.org/MarkUp/DTD/xhtml-frames-1.mod</p>
2670
          </td>
2671
          <td>
2672
            <p>w3c/xhtml11/xhtml-frames-1.mod</p>
2673
          </td>
2674
        </tr>
2675
        <tr>
2676
          <td>
2677
            <p>-//W3C//ENTITIES XHTML Modular Framework 1.0//EN</p>
2678
          </td>
2679
          <td>
2680
            <p>http://www.w3.org/MarkUp/DTD/xhtml-framework-1.mod</p>
2681
          </td>
2682
          <td>
2683
            <p>w3c/xhtml11/xhtml-framework-1.mod</p>
2684
          </td>
2685
        </tr>
2686
        <tr>
2687
          <td>
2688
            <p>-//W3C//ENTITIES XHTML HyperAttributes 1.0//EN</p>
2689
          </td>
2690
          <td>
2691
            <p>http://www.w3.org/MarkUp/DTD/xhtml-hyperAttributes-1.mod</p>
2692
          </td>
2693
          <td>
2694
            <p>w3c/xhtml11/xhtml-hyperAttributes-1.mod</p>
2695
          </td>
2696
        </tr>
2697
        <tr>
2698
          <td>
2699
            <p>-//W3C//ELEMENTS XHTML Hypertext 1.0//EN</p>
2700
          </td>
2701
          <td>
2702
            <p>http://www.w3.org/MarkUp/DTD/xhtml-hypertext-1.mod</p>
2703
          </td>
2704
          <td>
2705
            <p>w3c/xhtml11/xhtml-hypertext-1.mod</p>
2706
          </td>
2707
        </tr>
2708
        <tr>
2709
          <td>
2710
            <p>-//W3C//ELEMENTS XHTML Inline Frame Element 1.0//EN</p>
2711
          </td>
2712
          <td>
2713
            <p>http://www.w3.org/MarkUp/DTD/xhtml-iframe-1.mod</p>
2714
          </td>
2715
          <td>
2716
            <p>w3c/xhtml11/xhtml-iframe-1.mod</p>
2717
          </td>
2718
        </tr>
2719
        <tr>
2720
          <td>
2721
            <p>-//W3C//ELEMENTS XHTML Images 1.0//EN</p>
2722
          </td>
2723
          <td>
2724
            <p>http://www.w3.org/MarkUp/DTD/xhtml-image-1.mod</p>
2725
          </td>
2726
          <td>
2727
            <p>w3c/xhtml11/xhtml-image-1.mod</p>
2728
          </td>
2729
        </tr>
2730
        <tr>
2731
          <td>
2732
            <p>-//W3C//ELEMENTS XHTML Inline Phrasal 1.0//EN</p>
2733
          </td>
2734
          <td>
2735
            <p>http://www.w3.org/MarkUp/DTD/xhtml-inlphras-1.mod</p>
2736
          </td>
2737
          <td>
2738
            <p>w3c/xhtml11/xhtml-inlphras-1.mod</p>
2739
          </td>
2740
        </tr>
2741
        <tr>
2742
          <td>
2743
            <p>-//W3C//ELEMENTS XHTML Inline Presentation 1.0//EN</p>
2744
          </td>
2745
          <td>
2746
            <p>http://www.w3.org/MarkUp/DTD/xhtml-inlpres-1.mod</p>
2747
          </td>
2748
          <td>
2749
            <p>xhtml11/xhtml-inlpres-1.mod</p>
2750
          </td>
2751
        </tr>
2752
        <tr>
2753
          <td>
2754
            <p>-//W3C//ELEMENTS XHTML Inline Structural 1.0//EN</p>
2755
          </td>
2756
          <td>
2757
            <p>http://www.w3.org/MarkUp/DTD/xhtml-inlstruct-1.mod</p>
2758
          </td>
2759
          <td>
2760
            <p>w3c/xhtml11/xhtml-inlstruct-1.mod</p>
2761
          </td>
2762
        </tr>
2763
        <tr>
2764
          <td>
2765
            <p>-//W3C//ENTITIES XHTML Inline Style 1.0//EN</p>
2766
          </td>
2767
          <td>
2768
            <p>http://www.w3.org/MarkUp/DTD/xhtml-inlstyle-1.mod</p>
2769
          </td>
2770
          <td>
2771
            <p>w3c/xhtml11/xhtml-inlstyle-1.mod</p>
2772
          </td>
2773
        </tr>
2774
        <tr>
2775
          <td>
2776
            <p>-//W3C//ELEMENTS XHTML Inputmode 1.0//EN</p>
2777
          </td>
2778
          <td>
2779
            <p>http://www.w3.org/MarkUp/DTD/xhtml-inputmode-1.mod</p>
2780
          </td>
2781
          <td>
2782
            <p>w3c/xhtml11/xhtml-inputmode-1.mod</p>
2783
          </td>
2784
        </tr>
2785
        <tr>
2786
          <td>
2787
            <p>-//W3C//ELEMENTS XHTML Legacy Markup 1.0//EN</p>
2788
          </td>
2789
          <td>
2790
            <p>http://www.w3.org/MarkUp/DTD/xhtml-legacy-1.mod</p>
2791
          </td>
2792
          <td>
2793
            <p>w3c/xhtml11/xhtml-legacy-1.mod</p>
2794
          </td>
2795
        </tr>
2796
        <tr>
2797
          <td>
2798
            <p>-//W3C//ELEMENTS XHTML Legacy Redeclarations 1.0//EN</p>
2799
          </td>
2800
          <td>
2801
            <p>http://www.w3.org/MarkUp/DTD/xhtml-legacy-redecl-1.mod</p>
2802
          </td>
2803
          <td>
2804
            <p>w3c/xhtml11/xhtml-legacy-redecl-1.mod</p>
2805
          </td>
2806
        </tr>
2807
        <tr>
2808
          <td>
2809
            <p>-//W3C//ELEMENTS XHTML Link Element 1.0//EN</p>
2810
          </td>
2811
          <td>
2812
            <p>http://www.w3.org/MarkUp/DTD/xhtml-link-1.mod</p>
2813
          </td>
2814
          <td>
2815
            <p>w3c/xhtml11/xhtml-link-1.mod</p>
2816
          </td>
2817
        </tr>
2818
        <tr>
2819
          <td>
2820
            <p>-//W3C//ELEMENTS XHTML Lists 1.0//EN</p>
2821
          </td>
2822
          <td>
2823
            <p>http://www.w3.org/MarkUp/DTD/xhtml-list-1.mod</p>
2824
          </td>
2825
          <td>
2826
            <p>w3c/xhtml11/xhtml-list-1.mod</p>
2827
          </td>
2828
        </tr>
2829
        <tr>
2830
          <td>
2831
            <p>-//W3C//ELEMENTS XHTML Metainformation 1.0//EN</p>
2832
          </td>
2833
          <td>
2834
            <p>http://www.w3.org/MarkUp/DTD/xhtml-meta-1.mod</p>
2835
          </td>
2836
          <td>
2837
            <p>w3c/xhtml11/xhtml-meta-1.mod</p>
2838
          </td>
2839
        </tr>
2840
        <tr>
2841
          <td>
2842
            <p>-//W3C//ELEMENTS XHTML Metainformation 2.0//EN</p>
2843
          </td>
2844
          <td>
2845
            <p>http://www.w3.org/MarkUp/DTD/xhtml-meta-2.mod</p>
2846
          </td>
2847
          <td>
2848
            <p>w3c/xhtml11/xhtml-meta-2.mod</p>
2849
          </td>
2850
        </tr>
2851
        <tr>
2852
          <td>
2853
            <p>-//W3C//ENTITIES XHTML MetaAttributes 1.0//EN</p>
2854
          </td>
2855
          <td>
2856
            <p>http://www.w3.org/MarkUp/DTD/xhtml-metaAttributes-1.mod</p>
2857
          </td>
2858
          <td>
2859
            <p>w3c/xhtml11/xhtml-metaAttributes-1.mod</p>
2860
          </td>
2861
        </tr>
2862
        <tr>
2863
          <td>
2864
            <p>-//W3C//ELEMENTS XHTML Name Identifier 1.0//EN</p>
2865
          </td>
2866
          <td>
2867
            <p>http://www.w3.org/MarkUp/DTD/xhtml-nameident-1.mod</p>
2868
          </td>
2869
          <td>
2870
            <p>w3c/xhtml11/xhtml-nameident-1.mod</p>
2871
          </td>
2872
        </tr>
2873
        <tr>
2874
          <td>
2875
            <p>-//W3C//NOTATIONS XHTML Notations 1.0//EN</p>
2876
          </td>
2877
          <td>
2878
            <p>http://www.w3.org/MarkUp/DTD/xhtml-notations-1.mod</p>
2879
          </td>
2880
          <td>
2881
            <p>w3c/xhtml11/xhtml-notations-1.mod</p>
2882
          </td>
2883
        </tr>
2884
        <tr>
2885
          <td>
2886
            <p>-//W3C//ELEMENTS XHTML Embedded Object 1.0//EN</p>
2887
          </td>
2888
          <td>
2889
            <p>http://www.w3.org/MarkUp/DTD/xhtml-object-1.mod</p>
2890
          </td>
2891
          <td>
2892
            <p>w3c/xhtml11/xhtml-object-1.mod</p>
2893
          </td>
2894
        </tr>
2895
        <tr>
2896
          <td>
2897
            <p>-//W3C//ELEMENTS XHTML Param Element 1.0//EN</p>
2898
          </td>
2899
          <td>
2900
            <p>http://www.w3.org/MarkUp/DTD/xhtml-param-1.mod</p>
2901
          </td>
2902
          <td>
2903
            <p>w3c/xhtml11/xhtml-param-1.mod</p>
2904
          </td>
2905
        </tr>
2906
        <tr>
2907
          <td>
2908
            <p>-//W3C//ELEMENTS XHTML Presentation 1.0//EN</p>
2909
          </td>
2910
          <td>
2911
            <p>http://www.w3.org/MarkUp/DTD/xhtml-pres-1.mod</p>
2912
          </td>
2913
          <td>
2914
            <p>w3c/xhtml11/xhtml-pres-1.mod</p>
2915
          </td>
2916
        </tr>
2917
        <tr>
2918
          <td>
2919
            <p>-//W3C//ENTITIES XHTML-Print 1.0 Document Model 1.0//EN</p>
2920
          </td>
2921
          <td>
2922
            <p>http://www.w3.org/MarkUp/DTD/xhtml-print10-model-1.mod</p>
2923
          </td>
2924
          <td>
2925
            <p>w3c/xhtml11/xhtml-print10-model-1.mod</p>
2926
          </td>
2927
        </tr>
2928
        <tr>
2929
          <td>
2930
            <p>-//W3C//ENTITIES XHTML Qualified Names 1.0//EN</p>
2931
          </td>
2932
          <td>
2933
            <p>http://www.w3.org/MarkUp/DTD/xhtml-qname-1.mod</p>
2934
          </td>
2935
          <td>
2936
            <p>w3c/xhtml11/xhtml-qname-1.mod</p>
2937
          </td>
2938
        </tr>
2939
        <tr>
2940
          <td>
2941
            <p>-//W3C//ENTITIES XHTML+RDFa Document Model 1.0//EN</p>
2942
          </td>
2943
          <td>
2944
            <p>http://www.w3.org/MarkUp/DTD/xhtml-rdfa-model-1.mod</p>
2945
          </td>
2946
          <td>
2947
            <p>w3c/xhtml11/xhtml-rdfa-model-1.mod</p>
2948
          </td>
2949
        </tr>
2950
        <tr>
2951
          <td>
2952
            <p>-//W3C//ENTITIES XHTML RDFa Attribute Qnames 1.0//EN</p>
2953
          </td>
2954
          <td>
2955
            <p>http://www.w3.org/MarkUp/DTD/xhtml-rdfa-qname-1.mod</p>
2956
          </td>
2957
          <td>
2958
            <p>w3c/xhtml11/xhtml-rdfa-qname-1.mod</p>
2959
          </td>
2960
        </tr>
2961
        <tr>
2962
          <td>
2963
            <p>-//W3C//ENTITIES XHTML Role Attribute 1.0//EN</p>
2964
          </td>
2965
          <td>
2966
            <p>http://www.w3.org/MarkUp/DTD/xhtml-role-1.mod</p>
2967
          </td>
2968
          <td>
2969
            <p>w3c/xhtml11/xhtml-role-1.mod</p>
2970
          </td>
2971
        </tr>
2972
        <tr>
2973
          <td>
2974
            <p>-//W3C//ENTITIES XHTML Role Attribute Qnames 1.0//EN</p>
2975
          </td>
2976
          <td>
2977
            <p>http://www.w3.org/MarkUp/DTD/xhtml-role-qname-1.mod</p>
2978
          </td>
2979
          <td>
2980
            <p>w3c/xhtml11/xhtml-role-qname-1.mod</p>
2981
          </td>
2982
        </tr>
2983
        <tr>
2984
          <td>
2985
            <p>-//W3C//ELEMENTS XHTML Ruby 1.0//EN</p>
2986
          </td>
2987
          <td>
2988
            <p>http://www.w3.org/TR/ruby/xhtml-ruby-1.mod</p>
2989
          </td>
2990
          <td>
2991
            <p>w3c/xhtml11/xhtml-ruby-1.mod</p>
2992
          </td>
2993
        </tr>
2994
        <tr>
2995
          <td>
2996
            <p>-//W3C//ELEMENTS XHTML Scripting 1.0//EN</p>
2997
          </td>
2998
          <td>
2999
            <p>http://www.w3.org/MarkUp/DTD/xhtml-script-1.mod</p>
3000
          </td>
3001
          <td>
3002
            <p>w3c/xhtml11/xhtml-script-1.mod</p>
3003
          </td>
3004
        </tr>
3005
        <tr>
3006
          <td>
3007
            <p>-//W3C//ELEMENTS XHTML Server-side Image Maps 1.0//EN</p>
3008
          </td>
3009
          <td>
3010
            <p>http://www.w3.org/MarkUp/DTD/xhtml-ssismap-1.mod</p>
3011
          </td>
3012
          <td>
3013
            <p>w3c/xhtml11/xhtml-ssismap-1.mod</p>
3014
          </td>
3015
        </tr>
3016
        <tr>
3017
          <td>
3018
            <p>-//W3C//ELEMENTS XHTML Document Structure 1.0//EN</p>
3019
          </td>
3020
          <td>
3021
            <p>http://www.w3.org/MarkUp/DTD/xhtml-struct-1.mod</p>
3022
          </td>
3023
          <td>
3024
            <p>w3c/xhtml11/xhtml-struct-1.mod</p>
3025
          </td>
3026
        </tr>
3027
        <tr>
3028
          <td>
3029
            <p>-//W3C//DTD XHTML Style Sheets 1.0//EN</p>
3030
          </td>
3031
          <td>
3032
            <p>http://www.w3.org/MarkUp/DTD/xhtml-style-1.mod</p>
3033
          </td>
3034
          <td>
3035
            <p>w3c/xhtml11/xhtml-style-1.mod</p>
3036
          </td>
3037
        </tr>
3038
        <tr>
3039
          <td>
3040
            <p>-//W3C//ELEMENTS XHTML Tables 1.0//EN</p>
3041
          </td>
3042
          <td>
3043
            <p>http://www.w3.org/MarkUp/DTD/xhtml-table-1.mod</p>
3044
          </td>
3045
          <td>
3046
            <p>w3c/xhtml11/xhtml-table-1.mod</p>
3047
          </td>
3048
        </tr>
3049
        <tr>
3050
          <td>
3051
            <p>-//W3C//ELEMENTS XHTML Target 1.0//EN</p>
3052
          </td>
3053
          <td>
3054
            <p>http://www.w3.org/MarkUp/DTD/xhtml-target-1.mod</p>
3055
          </td>
3056
          <td>
3057
            <p>w3c/xhtml11/xhtml-target-1.mod</p>
3058
          </td>
3059
        </tr>
3060
        <tr>
3061
          <td>
3062
            <p>-//W3C//ELEMENTS XHTML Text 1.0//EN</p>
3063
          </td>
3064
          <td>
3065
            <p>http://www.w3.org/MarkUp/DTD/xhtml-text-1.mod</p>
3066
          </td>
3067
          <td>
3068
            <p>w3c/xhtml11/xhtml-text-1.mod</p>
3069
          </td>
3070
        </tr>
3071
        <tr>
3072
          <td>
3073
            <p>-//W3C//ENTITIES XHTML 1.1 Document Model 1.0//EN</p>
3074
          </td>
3075
          <td>
3076
            <p>http://www.w3.org/MarkUp/DTD/xhtml11-model-1.mod</p>
3077
          </td>
3078
          <td>
3079
            <p>w3c/xhtml11/xhtml11-model-1.mod</p>
3080
          </td>
3081
        </tr>
3082
        <tr>
3083
          <td>
3084
            <p>-//W3C//MathML 1.0//EN</p>
3085
          </td>
3086
          <td>
3087
            <p>http://www.w3.org/Math/DTD/mathml1/mathml.dtd</p>
3088
          </td>
3089
          <td>
3090
            <p>w3c/mathml/mathml1/mathml.dtd</p>
3091
          </td>
3092
        </tr>
3093
        <tr>
3094
          <td>
3095
            <p>-//W3C//DTD MathML 2.0//EN</p>
3096
          </td>
3097
          <td>
3098
            <p>http://www.w3.org/Math/DTD/mathml2/mathml2.dtd</p>
3099
          </td>
3100
          <td>
3101
            <p>w3c/mathml/mathml2/mathml2.dtd</p>
3102
          </td>
3103
        </tr>
3104
        <tr>
3105
          <td>
3106
            <p>-//W3C//DTD MathML 3.0//EN</p>
3107
          </td>
3108
          <td>
3109
            <p>http://www.w3.org/Math/DTD/mathml3/mathml3.dtd</p>
3110
          </td>
3111
          <td>
3112
            <p>w3c/mathml/mathml3/mathml3.dtd</p>
3113
          </td>
3114
        </tr>
3115
        <tr>
3116
          <td>
3117
            <p>-//W3C//DTD SVG 1.0//EN</p>
3118
          </td>
3119
          <td>
3120
            <p>http://www.w3.org/TR/2001/REC-SVG-20010904/DTD/svg10.dtd</p>
3121
          </td>
3122
          <td>
3123
            <p>w3c/svg10/svg10.dtd</p>
3124
          </td>
3125
        </tr>
3126
        <tr>
3127
          <td>
3128
            <p>-//W3C//DTD SVG 1.1//EN</p>
3129
          </td>
3130
          <td>
3131
            <p>http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd</p>
3132
          </td>
3133
          <td>
3134
            <p>w3c/svg11/svg11.dtd</p>
3135
          </td>
3136
        </tr>
3137
        <tr>
3138
          <td>
3139
            <p>-//W3C//DTD SVG 1.1 Tiny//EN</p>
3140
          </td>
3141
          <td>
3142
            <p>http://www.w3.org/Graphics/SVG/1.1/DTD/svg11-tiny.dtd</p>
3143
          </td>
3144
          <td>
3145
            <p>w3c/svg11/svg11-tiny.dtd</p>
3146
          </td>
3147
        </tr>
3148
        <tr>
3149
          <td>
3150
            <p>-//W3C//DTD SVG 1.1 Basic//EN</p>
3151
          </td>
3152
          <td>
3153
            <p>http://www.w3.org/Graphics/SVG/1.1/DTD/svg11-basic.dtd</p>
3154
          </td>
3155
          <td>
3156
            <p>w3c/svg11/svg11-basic.dtd</p>
3157
          </td>
3158
        </tr>
3159
        <tr>
3160
          <td>
3161
            <p>-//XML-DEV//ENTITIES RDDL Document Model 1.0//EN</p>
3162
          </td>
3163
          <td>
3164
            <p>http://www.rddl.org/xhtml-rddl-model-1.mod</p>
3165
          </td>
3166
          <td>
3167
            <p>w3c/rddl/xhtml-rddl-model-1.mod</p>
3168
          </td>
3169
        </tr>
3170
        <tr>
3171
          <td>
3172
            <p>-//XML-DEV//DTD XHTML RDDL 1.0//EN</p>
3173
          </td>
3174
          <td>
3175
            <p>http://www.rddl.org/rddl-xhtml.dtd</p>
3176
          </td>
3177
          <td>
3178
            <p>w3c/rddl/rddl-xhtml.dtd</p>
3179
          </td>
3180
        </tr>
3181
        <tr>
3182
          <td>
3183
            <p>-//XML-DEV//ENTITIES RDDL QName Module 1.0//EN</p>
3184
          </td>
3185
          <td>
3186
            <p>http://www.rddl.org/rddl-qname-1.mod</p>
3187
          </td>
3188
          <td>
3189
            <p>rddl/rddl-qname-1.mod</p>
3190
          </td>
3191
        </tr>
3192
        <tr>
3193
          <td>
3194
            <p>-//XML-DEV//ENTITIES RDDL Resource Module 1.0//EN</p>
3195
          </td>
3196
          <td>
3197
            <p>http://www.rddl.org/rddl-resource-1.mod</p>
3198
          </td>
3199
          <td>
3200
            <p>rddl/rddl-resource-1.mod</p>
3201
          </td>
3202
        </tr>
3203
        <tr>
3204
          <td>
3205
            <p>-//W3C//DTD Specification V2.10//EN</p>
3206
          </td>
3207
          <td>
3208
            <p>http://www.w3.org/2002/xmlspec/dtd/2.10/xmlspec.dtd</p>
3209
          </td>
3210
          <td>
3211
            <p>w3c/xmlspec/xmlspec.dtd</p>
3212
          </td>
3213
        </tr>
3214
        <tr>
3215
          <td>
3216
            <p>-//W3C//DTD XMLSCHEMA 200102//EN</p>
3217
          </td>
3218
          <td>
3219
            <p>http://www.w3.org/2001/XMLSchema.dtd</p>
3220
          </td>
3221
          <td>
3222
            <p>w3c/xmlschema/XMLSchema.dtd</p>
3223
          </td>
3224
        </tr>
3225

    
3226

    
3227
      </tbody>
3228
    </table>
3229

    
3230
    <p>This Saxon feature can be disabled by setting the configuration property <a
3231
        class="bodylink code" href="/configuration/config-features"
3232
        >Feature.ENTITY_RESOLVER_CLASS</a> to null; it is also possible to set it to a different
3233
        <code>EntityResolver</code> class (perhaps a subclass of Saxon's
3234
        <code>StandardEntityResolver</code>) that varies the behavior. If an
3235
        <code>EntityResolver</code> is set in the relevant <code>ParseOptions</code> or in an
3236
        <code>AugmentedSource</code> then this will override any <code>EntityResolver</code> set at
3237
      the configuration level.</p>
3238
  </section>
3239

    
3240
</article>
(18-18/21)