Bug #4561: XSLT 3 with recursive, streamable function passes the streamability analysis but output is different from non-streamed execution - Saxon - Saxonica Developer Community

Actions

Send by e-mail Copy link

Bug #4561

closed

XSLT 3 with recursive, streamable function passes the streamability analysis but output is different from non-streamed execution

Added by Martin Honnen over 4 years ago. Updated about 4 years ago.

Status:

Closed

Priority:

Normal

Assignee:

Michael Kay

Category:

Streaming

Sprint/Milestone:

Start date:

2020-05-26

Due date:

% Done:

100%

Estimated time:

Legacy ID:

Applies to branch:

10, trunk

Fix Committed on Branch:

10, trunk

Fixed in Maintenance Release:

10.3

Platforms:

Description

I tried myself again on a streamable stylesheet function, it passes the streamability analysis with both Saxon EE 9.9.17 and 10.1 but doesn't output the same result as when turning streaming off.

The code is

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    exclude-result-prefixes="#all"
    xmlns:mf="http://example.com/mf"
    version="3.0">
    
    <xsl:output indent="yes"/>
    
    <xsl:mode on-no-match="shallow-copy" streamable="yes"/>
    
    <xsl:function name="mf:nest" as="element()?" streamability="absorbing">
        <xsl:param name="elements" as="element()*"/>
        <xsl:copy select="head($elements)">
            <xsl:attribute name="type">data</xsl:attribute>
            <xsl:sequence select="mf:nest(tail($elements))"/>
        </xsl:copy>
    </xsl:function>
    
    <xsl:template match="person">
        <xsl:copy>
            <xsl:sequence select="mf:nest(*)"/>
        </xsl:copy>
    </xsl:template>
    
</xsl:stylesheet>

an input sample is

<?xml version="1.0" encoding="UTF-8"?>
<root>
    <person>
        <a/>
        <b/>
        <c/>
        <d/>
    </person>

    <person>
        <a/>
        <b/>
        <c/>
        <d/>
    </person>
</root>

without streaming I get the intended complete nesting of the siblings of the input person elements

<root>
   <person>
      <a type="data">
         <b type="data">
            <c type="data">
               <d type="data"/>
            </c>
         </b>
      </a>
   </person>
   <person>
      <a type="data">
         <b type="data">
            <c type="data">
               <d type="data"/>
            </c>
         </b>
      </a>
   </person>
</root>

with streaming the deeper nested elements in the result are missing:

<root>
   <person>
      <a type="data"/>
   </person>
   <person>
      <a type="data"/>
   </person>
</root>

Files

Download all files

head-tail-recursive-function1-streaming.xsl (852 Bytes) head-tail-recursive-function1-streaming.xsl		Martin Honnen, 2020-05-26 14:08
input1.xml (217 Bytes) input1.xml		Martin Honnen, 2020-05-26 14:08

Actions

Copy link

Updated by Michael Kay about 4 years ago

Status changed from New to In Progress
Assignee set to Michael Kay

Sorry that this one somehow slipped under the radar.

I've now turned it into XSLT3 test case su-absorbing-205 and have confirmed the failure.

Actions

Copy link

Updated by Michael Kay about 4 years ago

This isn't going to be easy, though a lot of the machinery does seem to be in place.

The a element is processed correctly. The WatchManager fires off the AbsorbingFunction, which correctly detects that the a element matches the condition head($elements), and it creates a new Watch (B) that should match nodes in tail(@elements).

When the b element is encountered, the WatchManager first notifies it to the initial invocation of the absorbing function, which correctly detects that b doesn't match head($elements) and therefore does nothing.

The WatchManager should now match b against Watch B, representing the second function invocation. But for some reason, Watch B is not on the active watch list, so nothing happens: we never get a second function invocation.

Actions

Copy link

Updated by Michael Kay about 4 years ago

I think there are things going wrong while the a element is being processed.

Firstly: FixedAttribute.evaluateItem(), which constructs the attribute node, is called twice while we're still positioned on the a element. That can't be right.

Secondly: we create a Watch to handle the recursive call, and it's added to the WatchManager, but by the time the startElement event for b comes along, the Watch has been removed.

Actions

Copy link

Updated by Michael Kay about 4 years ago

I think the problem is probably that AbsorbingFunctionCallAdjunct.close() calls start(). The intent of this is probably to handle the empty sequence case; we call start() when the first matching node is found, or at the end, if no matching nodes were found. If the sequence isn't empty the call on start() should do nothing, but that's not happening.

The flag started is set to false when open() is called, so this suggests that open() is being called more than once.

The first call on open() for the function call happens while processing the person start tag. This ensures that the inversion of the function is created, initialises the function's stack frame, and sets started=false, all while adding a Watch for the function call's argument (child::*) to the WatchManager.

When the a element start tag is reached, we correctly call AbsorbingFunctionCallAction.startSelectedParentNode(). This calls start(), which sets innerWatch to a Watch [Trigger3116] for head($elements), and sets started=true; it adds this Watch to the WatchManager. The WatchManager, still positioned on the a start tag, finds that this new Watch matches, and initiates a ForEachAction (because the xsl:copy with @select is compiled as xsl:for-each with an inner xsl:copy applied to the context item). The for-each has the call on tail($elements) as its consuming expression, so it creates a Watch for this expression [Trigger3190], which gets added to the WatchManager. While it is being added, the WatchManager calls open(), which causes the pre-consumption instructions to be executed, specifically, the xsl:copy start action and the attribute construction. This also has the effect of creating and opening a new AbsorbingFunctionCallAction for the recursive function call, with started=false.

Still processing the a startElement event, the WatchManager finds this new Watch and recognises it as matching, but the filter representing the tail() call correctly recognises that the a element is not part of the tail(), and therefore ignores it.

Now we get to the endElement() event for a. It turns out this is where the problems start. The WatchManager calls endSelectedParentNode() on [Trigger3190], that is, the watch for the tail() expression. This does nothing, because the node doesn't match the filter. But it then goes on to close() the Watch and remove it from the watch list. This is why the startElement event for b does not trigger the recursive function call.

Actions

Copy link

Updated by Michael Kay about 4 years ago

I'm thinking this function should probably be rejected as non-streamable. The rules (§19.8.5.2) say: If the declared type of the streaming parameter permits more than one node, then a variable reference referring to the streaming parameter is striding and consuming. We have two references to $elements, so surely we have two consuming subexpressions, even though one of them appears in an inspection context?

The reason we allow it is that the xsl:copy has been rewritten into an xsl:for-each, and the rules for xsl:for-each allow both operands to be striding and consuming. But this is on the assumption that the for-each body will select downwards from the context item in the value of the select expression. This assumption ceases to be valid in the presence of variable references whose values are consuming. Note that we could hit the same problem if the construct written, say, as

<xsl:for-each select="$elements">
  <xsl:for-each select="$elements">
   ..
 </
</

I think there's a rule missing in the spec: the function body must only contain one reference to the streaming parameter. This rule was thought to be unnecessary, because it was assumed that with more than one such reference, it would fail under the "two consuming operands" rule. But that rule only comes into play when the general streamability rules apply, and not when special streamability rules such as those for xsl:for-each come into force.

Actions

Copy link

Updated by Michael Kay about 4 years ago

Raised as an issue on the spec: see https://github.com/w3c/qtspecs/issues/15

I decided to implement the suggested rule: if the streaming parameter allows multiple nodes, then a reference to the parameter that appears within a higher-order operand of any construct is roaming and free ranging.

This makes the test non-streamable, and I am marking the test accordingly.

Actions

Copy link