Streamed processing of not well-formed XML: output on console isn't output in result file if -o option is used
Added by Martin Honnen almost 3 years ago
I am trying to understand whether streamed processing allows me to process a not well-formed input document and catch the parse error and have the nodes processed before the parse error produce a result.
For that I wrote some code along the lines of
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="3.0"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:err="http://www.w3.org/2005/xqt-errors"
exclude-result-prefixes="#all"
expand-text="yes">
<xsl:param name="input-uri" as="xs:string" select="'sample1.xml'"/>
<xsl:output method="xml" indent="yes"/>
<xsl:mode on-no-match="shallow-copy" streamable="yes"/>
<xsl:template name="xsl:initial-template">
<xsl:try rollback-output="no">
<xsl:source-document href="{$input-uri}" streamable="yes">
<xsl:apply-templates/>
</xsl:source-document>
<xsl:catch errors="*">
<xsl:message terminate="no">
<error code="{$err:code}" message="{$err:description}"/>
</xsl:message>
</xsl:catch>
</xsl:try>
</xsl:template>
</xsl:stylesheet>
When I set the input-uri
parameter to a file that is not well-formed (e.e. lacks the end tag of the root element) SaxonCS 11 interestingly outputs all nodes before the missing end tag when not specifying the -o
option (i.e. on the console) but when I try to use the o
option the created file is always empty.
So with a sample of
<?xml version="1.0" encoding="utf-8"?>
<root><section><title>s1</title></section><section><title>s2</title></section>
SaxonCS without using the -o
option outputs
<?xml version="1.0" encoding="UTF-8"?>
<root>
<section>
<title>s1</title>
</section>
<section>
<title>s2</title>
</section>Error on line 2 column 79 of sample2.xml:
XTDE3530 Error reported by XML parser processing file:///C:/SomePath/streamed-indenting/sample2.xml:
Unexpected end of file has occurred. The following elements are not closed: root. Line 2,
position 79.. The error could not be caught, because rollback-output=no was specified, and
output was already written to the result tree
Error reported by XML parser processing file:///C:/SomePath/streamed-indenting/sample2.xml: Unexpected end of file has occurred. The following elements are not closed: root. Line 2, position 79.. The error could not be caught, because rollback-output=no was specified, and output was already written to the result tree
Exiting with code 2
to the console but if I use the -o
option in the hope to capture the part
<?xml version="1.0" encoding="UTF-8"?>
<root>
<section>
<title>s1</title>
</section>
<section>
<title>s2</title>
</section>
in there the created file has length 0, no contents at all.
The spec when showing an example similar to the above (using an explicit xsl:result-document
instead of the primary result) says "the state of the file out.xml will be unpredictable" but nevertheless I wonder why the console has the output but the -o
option doesn't output anything.
Is that what is meant with unpredictable?
Replies (1)
RE: Streamed processing of not well-formed XML: output on console isn't output in result file if -o option is used - Added by Michael Kay almost 3 years ago
The spec should probably say "implementation-dependent" rather than "unpredictable" but it basically means the same thing.
I would think the most likely reason that the output file is empty is that it hasn't been flushed/closed.
I wouldn't expect that a parse error activates the try/catch. The streaming code works bottom up in push mode, so consuming subexpressions trigger evaluation of their parent expressions, not the other way around. Try/Catch is therefore implemented by pushing an error outcome down the pipeline from the child expression to the parent, in much the same way as an ordinary result.
I would expect the output file to be closed, but it's certainly possible that it isn't flushed.
Please register to reply