Bug #4561
closedXSLT 3 with recursive, streamable function passes the streamability analysis but output is different from non-streamed execution
100%
Description
I tried myself again on a streamable stylesheet function, it passes the streamability analysis with both Saxon EE 9.9.17 and 10.1 but doesn't output the same result as when turning streaming off.
The code is
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="#all"
xmlns:mf="http://example.com/mf"
version="3.0">
<xsl:output indent="yes"/>
<xsl:mode on-no-match="shallow-copy" streamable="yes"/>
<xsl:function name="mf:nest" as="element()?" streamability="absorbing">
<xsl:param name="elements" as="element()*"/>
<xsl:copy select="head($elements)">
<xsl:attribute name="type">data</xsl:attribute>
<xsl:sequence select="mf:nest(tail($elements))"/>
</xsl:copy>
</xsl:function>
<xsl:template match="person">
<xsl:copy>
<xsl:sequence select="mf:nest(*)"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
an input sample is
<?xml version="1.0" encoding="UTF-8"?>
<root>
<person>
<a/>
<b/>
<c/>
<d/>
</person>
<person>
<a/>
<b/>
<c/>
<d/>
</person>
</root>
without streaming I get the intended complete nesting of the siblings of the input person
elements
<root>
<person>
<a type="data">
<b type="data">
<c type="data">
<d type="data"/>
</c>
</b>
</a>
</person>
<person>
<a type="data">
<b type="data">
<c type="data">
<d type="data"/>
</c>
</b>
</a>
</person>
</root>
with streaming the deeper nested elements in the result are missing:
<root>
<person>
<a type="data"/>
</person>
<person>
<a type="data"/>
</person>
</root>
Files
Updated by Michael Kay about 4 years ago
- Status changed from New to In Progress
- Assignee set to Michael Kay
Sorry that this one somehow slipped under the radar.
I've now turned it into XSLT3 test case su-absorbing-205
and have confirmed the failure.
Updated by Michael Kay about 4 years ago
This isn't going to be easy, though a lot of the machinery does seem to be in place.
The a
element is processed correctly. The WatchManager fires off the AbsorbingFunction, which correctly detects that the a
element matches the condition head($elements)
, and it creates a new Watch (B) that should match nodes in tail(@elements)
.
When the b
element is encountered, the WatchManager first notifies it to the initial invocation of the absorbing function, which correctly detects that b
doesn't match head($elements)
and therefore does nothing.
The WatchManager should now match b
against Watch B, representing the second function invocation. But for some reason, Watch B is not on the active watch list, so nothing happens: we never get a second function invocation.
Updated by Michael Kay about 4 years ago
I think there are things going wrong while the a
element is being processed.
Firstly: FixedAttribute.evaluateItem(
), which constructs the attribute node, is called twice while we're still positioned on the a
element. That can't be right.
Secondly: we create a Watch
to handle the recursive call, and it's added to the WatchManager
, but by the time the startElement
event for b
comes along, the Watch
has been removed.
Updated by Michael Kay about 4 years ago
I think the problem is probably that AbsorbingFunctionCallAdjunct.close()
calls start()
. The intent of this is probably to handle the empty sequence case; we call start() when the first matching node is found, or at the end, if no matching nodes were found. If the sequence isn't empty the call on start() should do nothing, but that's not happening.
The flag started
is set to false when open() is called, so this suggests that open() is being called more than once.
The first call on open() for the function call happens while processing the person
start tag. This ensures that the inversion of the function is created, initialises the function's stack frame, and sets started=false
, all while adding a Watch for the function call's argument (child::*
) to the WatchManager.
When the a
element start tag is reached, we correctly call AbsorbingFunctionCallAction.startSelectedParentNode()
. This calls start()
, which sets innerWatch
to a Watch [Trigger3116] for head($elements)
, and sets started=true
; it adds this Watch
to the WatchManager
. The WatchManager
, still positioned on the a
start tag, finds that this new Watch
matches, and initiates a ForEachAction
(because the xsl:copy
with @select
is compiled as xsl:for-each
with an inner xsl:copy
applied to the context item). The for-each
has the call on tail($elements) as its consuming expression, so it creates a Watch
for this expression [Trigger3190], which gets added to the WatchManager
. While it is being added, the WatchManager
calls open(), which causes the pre-consumption instructions to be executed, specifically, the xsl:copy start action and the attribute construction. This also has the effect of creating and opening a new AbsorbingFunctionCallAction
for the recursive function call, with started=false
.
Still processing the a
startElement event, the WatchManager
finds this new Watch
and recognises it as matching, but the filter representing the tail() call correctly recognises that the a
element is not part of the tail(), and therefore ignores it.
Now we get to the endElement()
event for a
. It turns out this is where the problems start. The WatchManager
calls endSelectedParentNode()
on [Trigger3190], that is, the watch for the tail()
expression. This does nothing, because the node doesn't match the filter. But it then goes on to close()
the Watch
and remove it from the watch list. This is why the startElement
event for b
does not trigger the recursive function call.
Updated by Michael Kay about 4 years ago
I'm thinking this function should probably be rejected as non-streamable. The rules (ยง19.8.5.2) say: If the declared type of the streaming parameter permits more than one node, then a variable reference referring to the streaming parameter is striding and consuming. We have two references to $elements
, so surely we have two consuming subexpressions, even though one of them appears in an inspection context?
The reason we allow it is that the xsl:copy
has been rewritten into an xsl:for-each
, and the rules for xsl:for-each
allow both operands to be striding and consuming. But this is on the assumption that the for-each body will select downwards from the context item in the value of the select expression. This assumption ceases to be valid in the presence of variable references whose values are consuming. Note that we could hit the same problem if the construct written, say, as
<xsl:for-each select="$elements">
<xsl:for-each select="$elements">
..
</
</
I think there's a rule missing in the spec: the function body must only contain one reference to the streaming parameter. This rule was thought to be unnecessary, because it was assumed that with more than one such reference, it would fail under the "two consuming operands" rule. But that rule only comes into play when the general streamability rules apply, and not when special streamability rules such as those for xsl:for-each
come into force.
Updated by Michael Kay about 4 years ago
Raised as an issue on the spec: see https://github.com/w3c/qtspecs/issues/15
I decided to implement the suggested rule: if the streaming parameter allows multiple nodes, then a reference to the parameter that appears within a higher-order operand of any construct is roaming and free ranging.
This makes the test non-streamable, and I am marking the test accordingly.
Updated by Michael Kay about 4 years ago
- Status changed from In Progress to Resolved
- Applies to branch trunk added
- Applies to branch deleted (
9.9) - Fix Committed on Branch 10, trunk added
Updated by O'Neil Delpratt about 4 years ago
Bug fix applied in the Saxon 10.3 maintenance release
Updated by O'Neil Delpratt about 4 years ago
- Status changed from Resolved to Closed
- % Done changed from 0 to 100
- Fixed in Maintenance Release 10.3 added
Please register to edit this issue