Bug #3883
closedXSLT 3 using xsl:merge to merge some files gives desired result with streaming and Saxon 9.8 EE but wrong result with duplicated elements with HE or with EE and streaming turned off
0%
Description
I have the following XSLT 3 program
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:mf="http://example.com/mf"
exclude-result-prefixes="#all"
version="3.0">
<xsl:param name="input-uri" as="xs:string" select="'.'"/>
<xsl:param name="file-pattern" as="xs:string" select="'input*.xml'"/>
<xsl:param name="merge-select-expression" as="xs:string" static="yes" select="'*/*/*'"/>
<xsl:param name="xslt-pattern-to-add-file-name" as="xs:string" static="yes" select="'item'"/>
<xsl:output indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:mode on-no-match="shallow-copy"/>
<xsl:template name="xsl:initial-template">
<xsl:merge>
<xsl:merge-source
for-each-source="uri-collection($input-uri || '?select=' || $file-pattern)"
_select="{$merge-select-expression}"
streamable="yes">
<xsl:merge-key select="true()"/>
</xsl:merge-source>
<xsl:merge-action>
<xsl:sequence select="let $group-tail := tail(current-merge-group()) return mf:construct-doc(., $group-tail)"/>
</xsl:merge-action>
</xsl:merge>
</xsl:template>
<xsl:template _match="{$xslt-pattern-to-add-file-name}">
<xsl:comment select="'Copied from ' || tokenize(document-uri(/), '/')[last()]"/>
<xsl:next-match/>
</xsl:template>
<xsl:function name="mf:construct-doc" as="document-node()">
<xsl:param name="first-node" as="node()"/>
<xsl:param name="nodes" as="node()*"/>
<xsl:apply-templates select="root($first-node)" mode="construct">
<xsl:with-param name="nodes" select="$nodes"/>
</xsl:apply-templates>
</xsl:function>
<xsl:mode name="construct" on-no-match="shallow-copy"/>
<xsl:template _match="{$merge-select-expression}" mode="construct">
<xsl:param name="nodes"/>
<xsl:apply-templates select="."/>
<xsl:apply-templates select="$nodes"/>
</xsl:template>
</xsl:stylesheet>
that is meant to be run with the -it
option to use xsl:merge
with a merge source taken from a uri-collection()
of some XML files in a folder to simply create new result file containing all nodes selected by _select="{$merge-select-expression}"
and additionally allow to mark result elements with a comment to where the file comes from.
Given to samples files input1.xml
and input2.xml
in the form
<root>
<items>
<item>
<foo>foo 1, file 1</foo>
<name>name 1, file 1</name>
</item>
<item>
<foo>foo 2, file 1</foo>
<name>name 2, file 1</name>
</item>
</items>
</root>
and
<root>
<items>
<item>
<foo>foo 1, file 2</foo>
<name>name 1, file 2</name>
</item>
<item>
<foo>foo 2, file 2</foo>
<name>name 2, file 2</name>
</item>
</items>
</root>
when I run it with Saxon 9.8.0.12 EE from the command line I get the wanted output:
<root>
<items><!--Copied from input1.xml-->
<item>
<foo>foo 1, file 1</foo>
<name>name 1, file 1</name>
</item>
<!--Copied from input1.xml-->
<item>
<foo>foo 2, file 1</foo>
<name>name 2, file 1</name>
</item>
<!--Copied from input2.xml-->
<item>
<foo>foo 1, file 2</foo>
<name>name 1, file 2</name>
</item>
<!--Copied from input2.xml-->
<item>
<foo>foo 2, file 2</foo>
<name>name 2, file 2</name>
</item>
</items>
</root>
However when run with Saxon 9.8.0.14 HE from the command line all elements are duplicated:
<root>
<items>
<!--Copied from input1.xml--> <item>
<foo>foo 1, file 1</foo>
<name>name 1, file 1</name>
</item>
<!--Copied from input1.xml-->
<item>
<foo>foo 2, file 1</foo>
<name>name 2, file 1</name>
</item>
<!--Copied from input2.xml-->
<item>
<foo>foo 1, file 2</foo>
<name>name 1, file 2</name>
</item>
<!--Copied from input2.xml-->
<item>
<foo>foo 2, file 2</foo>
<name>name 2, file 2</name>
</item>
<!--Copied from input1.xml--> <item>
<foo>foo 2, file 1</foo>
<name>name 2, file 1</name>
</item>
<!--Copied from input1.xml-->
<item>
<foo>foo 2, file 1</foo>
<name>name 2, file 1</name>
</item>
<!--Copied from input2.xml-->
<item>
<foo>foo 1, file 2</foo>
<name>name 1, file 2</name>
</item>
<!--Copied from input2.xml-->
<item>
<foo>foo 2, file 2</foo>
<name>name 2, file 2</name>
</item>
</items>
</root>
Updated by Michael Kay about 6 years ago
Problem reproduced as test cases merge-097, merge-097s.
Updated by Michael Kay about 6 years ago
I believe that the non-streamed output is correct.
The four item elements form a single merge group. The xsl:merge-action is being processed once, with this sequence of four elements as the current merge group. The construct-doc() function is being called once, with the first node as the first argument and the other three nodes as the second argument. The function applies templates in mode "construct" to the root of the first input document. There is no matching template, so it shallow-copies until it gets to the item elements. There are two item elements in the first input document, and for each one it does two things:
(a) output that item element
(b) output the tail of the current merge group (three item elements)
leading to a total of 8 item elements in the output in the sequence A1, A2, B1, B2, A2, A2, B1, B2.
So the question is, why is the output different in the streaming case?
Updated by Martin Honnen about 6 years ago
The spec https://www.w3.org/TR/xslt-30/#streamable-merging says
When streamable="yes" is specified on an xsl:merge-source element, then (whether or not streamed processing is actually used, and whether or not the processor supports streaming) the expression appearing in the select attribute is implicitly used as the argument of a call on the snapshot function
so based on your explanation the Saxon HE version is not taking the snapshot while EE when it streams does. But according to the spec even HE not supporting streaming should be working with snapshots so processing the root of the snapshot of the first item would not output two items but only that single on in the snapshot plus the three items in the tail.
Updated by Michael Kay about 6 years ago
- Status changed from New to Rejected
I think the output for the streamed case is also correct. With streamed merging, we take a snapshot of the nodes in the merge population. When we navigate to the root of the first node, we are therefore in a document that contains only one "item" element, and so the implicit shallow-copy from this root only finds one item, and therefore only copies the tail of the current merge group once.
I seem to remember some debate in the WG as to whether the "snapshot" semantics should also apply in the non-streaming case, to ensure that both cases produced the same output. I don't remember the reasoning that resolved this, but the spec is clear that snapshot() is performed only when streamable="yes".
I'm therefore closing the report as invalid.
Updated by Michael Kay about 6 years ago
- Status changed from Rejected to In Progress
- Assignee set to Michael Kay
Updated by Michael Kay about 6 years ago
OK. I was running the non-streamed case by setting streamable="no", not by setting streamable="yes" and falling back to a non-streaming processor.
Saxon, I think, does not implement the rule that a snapshot is taken when in "streamable with fallback" case.
Updated by Martin Honnen about 6 years ago
Although based on the previous comment I have to retreat my judgement "it exhibits the same duplication of result nodes filed in https://saxonica.plan.io/issues/3883 as a separate bug" at the end of the https://saxonica.plan.io/issues/3884 report as there, in an attempt to reduce code to a minimum, I did not use streamable="yes"
, and did not notice that that changes the semantics of the merge. So I think for that code your analysis would be right that the code produces a duplication.
In the code of this bug report, however, I think, that, if that implicit snapshot is used, the result should only contain the merged items once.
Updated by Michael Kay about 6 years ago
Hitting a test driver issue trying to test merge-067sf which requests
<dependencies>
<feature value="streaming" satisfied="false"/>
<feature value="streaming-fallback" satisfied="true"/>
</dependencies>
Both the HE and EE test drivers decline to run this test, whether or not the -streaming option is present on the test driver command line, despite the fact that Saxon does have configuration options to set streaming-fallback on.
Updated by Michael Kay about 6 years ago
Bug #3584 says that we're not supporting streaming fallback until 9.9. So I think this will be a "won't fix" as far as 9.8 is concerned. But we still need to investigate it on the 9.9 branch.
Updated by Michael Kay about 6 years ago
- Status changed from In Progress to Won't fix
This has been fixed on the 9.9 branch, and we have decided not to fix it on the 9.8 branch, because 9.8 generally does not handle "streaming fallback" properly.
Please register to edit this issue