XSLT 3 with recursive, streamable function passes the streamability analysis but output is different from non-streamed execution
I tried myself again on a streamable stylesheet function, it passes the streamability analysis with both Saxon EE 9.9.17 and 10.1 but doesn't output the same result as when turning streaming off.
The code is
<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" exclude-result-prefixes="#all" xmlns:mf="http://example.com/mf" version="3.0"> <xsl:output indent="yes"/> <xsl:mode on-no-match="shallow-copy" streamable="yes"/> <xsl:function name="mf:nest" as="element()?" streamability="absorbing"> <xsl:param name="elements" as="element()*"/> <xsl:copy select="head($elements)"> <xsl:attribute name="type">data</xsl:attribute> <xsl:sequence select="mf:nest(tail($elements))"/> </xsl:copy> </xsl:function> <xsl:template match="person"> <xsl:copy> <xsl:sequence select="mf:nest(*)"/> </xsl:copy> </xsl:template> </xsl:stylesheet>
an input sample is
<?xml version="1.0" encoding="UTF-8"?> <root> <person> <a/> <b/> <c/> <d/> </person> <person> <a/> <b/> <c/> <d/> </person> </root>
without streaming I get the intended complete nesting of the siblings of the input
<root> <person> <a type="data"> <b type="data"> <c type="data"> <d type="data"/> </c> </b> </a> </person> <person> <a type="data"> <b type="data"> <c type="data"> <d type="data"/> </c> </b> </a> </person> </root>
with streaming the deeper nested elements in the result are missing:
<root> <person> <a type="data"/> </person> <person> <a type="data"/> </person> </root>
#2 Updated by Michael Kay about 1 year ago
This isn't going to be easy, though a lot of the machinery does seem to be in place.
a element is processed correctly. The WatchManager fires off the AbsorbingFunction, which correctly detects that the
a element matches the condition
head($elements), and it creates a new Watch (B) that should match nodes in
b element is encountered, the WatchManager first notifies it to the initial invocation of the absorbing function, which correctly detects that
b doesn't match
head($elements) and therefore does nothing.
The WatchManager should now match
b against Watch B, representing the second function invocation. But for some reason, Watch B is not on the active watch list, so nothing happens: we never get a second function invocation.
#3 Updated by Michael Kay about 1 year ago
I think there are things going wrong while the
a element is being processed.
FixedAttribute.evaluateItem(), which constructs the attribute node, is called twice while we're still positioned on the
a element. That can't be right.
Secondly: we create a
Watch to handle the recursive call, and it's added to the
WatchManager, but by the time the
startElement event for
b comes along, the
Watch has been removed.
#4 Updated by Michael Kay about 1 year ago
I think the problem is probably that
start(). The intent of this is probably to handle the empty sequence case; we call start() when the first matching node is found, or at the end, if no matching nodes were found. If the sequence isn't empty the call on start() should do nothing, but that's not happening.
started is set to false when open() is called, so this suggests that open() is being called more than once.
The first call on open() for the function call happens while processing the
person start tag. This ensures that the inversion of the function is created, initialises the function's stack frame, and sets
started=false, all while adding a Watch for the function call's argument (
child::*) to the WatchManager.
a element start tag is reached, we correctly call
AbsorbingFunctionCallAction.startSelectedParentNode(). This calls
start(), which sets
innerWatch to a Watch [Trigger3116] for
head($elements), and sets
started=true; it adds this
Watch to the
WatchManager, still positioned on the
a start tag, finds that this new
Watch matches, and initiates a
ForEachAction (because the
@select is compiled as
xsl:for-each with an inner
xsl:copy applied to the context item). The
for-each has the call on tail($elements) as its consuming expression, so it creates a
Watch for this expression [Trigger3190], which gets added to the
WatchManager. While it is being added, the
WatchManager calls open(), which causes the pre-consumption instructions to be executed, specifically, the xsl:copy start action and the attribute construction. This also has the effect of creating and opening a new
AbsorbingFunctionCallAction for the recursive function call, with
Still processing the
a startElement event, the
WatchManager finds this new
Watch and recognises it as matching, but the filter representing the tail() call correctly recognises that the
a element is not part of the tail(), and therefore ignores it.
Now we get to the
endElement() event for
a. It turns out this is where the problems start. The
endSelectedParentNode() on [Trigger3190], that is, the watch for the
tail() expression. This does nothing, because the node doesn't match the filter. But it then goes on to
Watch and remove it from the watch list. This is why the
startElement event for
b does not trigger the recursive function call.
#5 Updated by Michael Kay about 1 year ago
I'm thinking this function should probably be rejected as non-streamable. The rules (§126.96.36.199) say: If the declared type of the streaming parameter permits more than one node, then a variable reference referring to the streaming parameter is striding and consuming. We have two references to
$elements, so surely we have two consuming subexpressions, even though one of them appears in an inspection context?
The reason we allow it is that the
xsl:copy has been rewritten into an
xsl:for-each, and the rules for
xsl:for-each allow both operands to be striding and consuming. But this is on the assumption that the for-each body will select downwards from the context item in the value of the select expression. This assumption ceases to be valid in the presence of variable references whose values are consuming. Note that we could hit the same problem if the construct written, say, as
<xsl:for-each select="$elements"> <xsl:for-each select="$elements"> .. </ </
I think there's a rule missing in the spec: the function body must only contain one reference to the streaming parameter. This rule was thought to be unnecessary, because it was assumed that with more than one such reference, it would fail under the "two consuming operands" rule. But that rule only comes into play when the general streamability rules apply, and not when special streamability rules such as those for
xsl:for-each come into force.
#6 Updated by Michael Kay about 1 year ago
Raised as an issue on the spec: see https://github.com/w3c/qtspecs/issues/15
I decided to implement the suggested rule: if the streaming parameter allows multiple nodes, then a reference to the parameter that appears within a higher-order operand of any construct is roaming and free ranging.
This makes the test non-streamable, and I am marking the test accordingly.
Please register to edit this issue