Project

Profile

Help

Streamed processing of not well-formed XML: output on console isn't output in result file if -o option is used

Added by Martin Honnen almost 3 years ago

I am trying to understand whether streamed processing allows me to process a not well-formed input document and catch the parse error and have the nodes processed before the parse error produce a result.

For that I wrote some code along the lines of

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  version="3.0"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:err="http://www.w3.org/2005/xqt-errors"
  exclude-result-prefixes="#all"
  expand-text="yes">

  <xsl:param name="input-uri" as="xs:string" select="'sample1.xml'"/>

  <xsl:output method="xml" indent="yes"/>

  <xsl:mode on-no-match="shallow-copy" streamable="yes"/>

  <xsl:template name="xsl:initial-template">
    <xsl:try rollback-output="no">
      <xsl:source-document href="{$input-uri}" streamable="yes">
        <xsl:apply-templates/>
      </xsl:source-document>
      <xsl:catch errors="*">
        <xsl:message terminate="no">
           <error code="{$err:code}" message="{$err:description}"/>
        </xsl:message>
      </xsl:catch>
    </xsl:try>
  </xsl:template>

</xsl:stylesheet>

When I set the input-uri parameter to a file that is not well-formed (e.e. lacks the end tag of the root element) SaxonCS 11 interestingly outputs all nodes before the missing end tag when not specifying the -o option (i.e. on the console) but when I try to use the o option the created file is always empty.

So with a sample of

<?xml version="1.0" encoding="utf-8"?>
<root><section><title>s1</title></section><section><title>s2</title></section>

SaxonCS without using the -o option outputs

<?xml version="1.0" encoding="UTF-8"?>
<root>
   <section>
      <title>s1</title>
   </section>
   <section>
      <title>s2</title>
   </section>Error on line 2 column 79 of sample2.xml:
  XTDE3530  Error reported by XML parser processing file:///C:/SomePath/streamed-indenting/sample2.xml:
  Unexpected end of file has occurred. The following elements are not closed: root. Line 2,
  position 79.. The error could not be caught, because rollback-output=no was specified, and
  output was already written to the result tree
Error reported by XML parser processing file:///C:/SomePath/streamed-indenting/sample2.xml: Unexpected end of file has occurred. The following elements are not closed: root. Line 2, position 79.. The error could not be caught, because rollback-output=no was specified, and output was already written to the result tree
Exiting with code 2

to the console but if I use the -o option in the hope to capture the part

<?xml version="1.0" encoding="UTF-8"?>
<root>
   <section>
      <title>s1</title>
   </section>
   <section>
      <title>s2</title>
   </section>

in there the created file has length 0, no contents at all.

The spec when showing an example similar to the above (using an explicit xsl:result-document instead of the primary result) says "the state of the file out.xml will be unpredictable" but nevertheless I wonder why the console has the output but the -o option doesn't output anything.

Is that what is meant with unpredictable?


Replies (1)

RE: Streamed processing of not well-formed XML: output on console isn't output in result file if -o option is used - Added by Michael Kay almost 3 years ago

The spec should probably say "implementation-dependent" rather than "unpredictable" but it basically means the same thing.

I would think the most likely reason that the output file is empty is that it hasn't been flushed/closed.

I wouldn't expect that a parse error activates the try/catch. The streaming code works bottom up in push mode, so consuming subexpressions trigger evaluation of their parent expressions, not the other way around. Try/Catch is therefore implemented by pushing an error outcome down the pipeline from the child expression to the parent, in much the same way as an ordinary result.

I would expect the output file to be closed, but it's certainly possible that it isn't flushed.

    (1-1/1)

    Please register to reply