Bug #4493: end tag not written when using xsl:iterate/xsl:break with early exit - Saxon - Saxonica Developer Community

Actions

Send by e-mail Copy link

Bug #4493

closed

end tag not written when using xsl:iterate/xsl:break with early exit

Added by Martin Honnen over 4 years ago. Updated over 4 years ago.

Status:

Closed

Priority:

Normal

Assignee:

Michael Kay

Category:

Streaming

Sprint/Milestone:

Start date:

2020-03-22

Due date:

% Done:

100%

Estimated time:

Legacy ID:

Applies to branch:

Fix Committed on Branch:

Fixed in Maintenance Release:

10.1

Platforms:

Description

I have a stylesheet using streaming and xsl:iterate with xsl:break doing some early exit, in Saxon 9.9 the result is well-formed document but with 10 the end tag of the root element (created as a literal result element) is missing.

XSLT:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:saxon="http://saxon.sf.net/"
    exclude-result-prefixes="#all" version="3.0">
    
    <xsl:output indent="yes"/>
    
    <xsl:mode streamable="yes" on-no-match="shallow-skip"/>
    
    <xsl:template match="/">
        <items>
            <xsl:iterate select="root/items/item">
                <xsl:copy-of select=".[@cat != 'bar']"/>
                <xsl:if test="@cat = 'bar'">
                    <xsl:break/>
                </xsl:if>
            </xsl:iterate>
        </items>
    </xsl:template>
    
</xsl:stylesheet>

XML:

<?xml version="1.0" encoding="UTF-8"?>
<root>
    <items>
        <item cat="foo">
            <value>item 1</value>
        </item>        
        <item cat="foo">
            <value>item 2</value>
        </item>
        <item cat="foo">
            <value>item 3</value>
        </item>
        <item cat="foo">
            <value>item 4</value>
        </item>
        <item cat="foo">
            <value>item 5</value>
        </item>
        <item cat="bar">
            <value>item 6</value>
        </item>
        <item cat="bar">
            <value>item 7</value>
        </item>
        <item cat="bar">
            <value>item 8</value>
        </item>
        <item cat="bar">
            <value>item 9</value>
        </item>
    </items>
</root>

Output with Saxon-EE 10.0J:

<?xml version="1.0" encoding="UTF-8"?>
<items>
   <item cat="foo">
      <value>item 1</value>
   </item>
   <item cat="foo">
      <value>item 2</value>
   </item>
   <item cat="foo">
      <value>item 3</value>
   </item>
   <item cat="foo">
      <value>item 4</value>
   </item>
   <item cat="foo">
      <value>item 5</value>
   </item>

Output with Saxon-EE 9.9.1.7J:

<?xml version="1.0" encoding="UTF-8"?>
<items>
   <item cat="foo">
      <value>item 1</value>
   </item>
   <item cat="foo">
      <value>item 2</value>
   </item>
   <item cat="foo">
      <value>item 3</value>
   </item>
   <item cat="foo">
      <value>item 4</value>
   </item>
   <item cat="foo">
      <value>item 5</value>
   </item>
</items>

Files

Download all files

input1.xml (762 Bytes) input1.xml		Martin Honnen, 2020-03-22 14:01
iterate-break2.xsl (686 Bytes) iterate-break2.xsl		Martin Honnen, 2020-03-22 14:01

Actions

Copy link

Updated by Michael Kay over 4 years ago

Thanks for reporting it. The logic for deciding how many endElement() calls to issue on early exit during streaming has indeed changed in 10.0.

Actions

Copy link

Updated by Michael Kay over 4 years ago

The test (added as si-iterate-140) is working when run in the test driver, but failing when run from the command line.

The difference is probably that in the test driver, we are writing the result to a document builder, while from the command line, we are writing to a serializer; the document builder can probably recover from a missing endElement() event, whereas in the serializer, this will result in a missing end tag.

Actions

Copy link

Updated by Michael Kay over 4 years ago

In 9.9 the code in WatchManager.endElement() (on catching a QuitParsingException) did:

            while (!activeWatchStack.isEmpty()) {
                endElement();
            }

In 10.0 it does

           while (!namespaceStack.isEmpty()) {
                endElement();
            }

The difference is that activeWatchStack has one more entry (an entry is created for the startDocument/endDocument event pair, as well as for startElement/endElement).

Simply reverting the change causes a failure with an EmptyStackException; the change must have been made for a reason.

The change was made in commit 8806 on 2019-02-06, and it was part of the general change to the handling of namespaces on the Receiver pipeline; therefore not a change that was specifically concerned with streaming, and therefore it's likely that the effect on streaming wasn't thoroughly explored at the time. The fact that we run nearly all our streaming tests to a document builder rather that to a serializer might well have masked the problem.

Actions

Copy link

Updated by Michael Kay over 4 years ago

I can fix the problem by reverting to using the activeWatchStack as the control for how many endElement() calls to issue, and then defending against the EmptyStackException by testing for namespaceStack.isEmpty() before doing the pop().

This now raises the question of testing. How should we ensure that a problem like this doesn't happen again (or indeed, that it isn't present in other test cases?

I've been running the tests with Java assertions enabled, and in that situation we actually have a RegularSequenceChecker on the pipeline, which ought to detect if there is a missing end tag. But it seems that we're allowing a close() event to occur in any state, even with unclosed element tags -- and that's probably necessary for error scenarios. I shall try changing that.

Actions

Copy link

Updated by Michael Kay over 4 years ago

The extra check inserted into RegularSequenceChecker.close() seems to cause no problems. However, the change to WatchManager.endElement() to fix the original bug causes regression to a number of other tests.

I'll focus on one of these: sf-boolean-117. This test, instead of the expected result <out>true</out>, is outputting <out>true</out>false. If I revert the fix, this test runs OK. Note: many of the tests that are now failing are tests that construct new unstreamed nodes on the streamed pipeline, which is a situation that wasn't working at all in 9.9.

For the moment, I'm reverting the patch. I think the whole issue of tidying up after an early exit needs some careful design thought, rather than just hacking the code until the tests all pass.

Actions

Copy link

Updated by Michael Kay over 4 years ago

Let's look closely at the failing test.

  <xsl:template match="/">
    <items>
      <xsl:iterate select="root/items/item">
        <xsl:copy-of select=".[@cat != ('foo','bar')]"/>
        <xsl:if test="@cat = 'bar'">
          <xsl:break/>
        </xsl:if>
      </xsl:iterate>
    </items>
  </xsl:template>

In the deferredStartDocument() call we identify the Watch for firing the applyTemplates action. The startSelectedParentNode() method (with the document node as argument) finds the template rule, forms its inversion, and adds a new Watch for this rule. We add a closedown action for this watch (2147), and open the watch. This calls IterateAction.open(), which sends the open() call up the pipeline, causing the start tag to be emitted,

Later we get to the start tag for an item element. This triggers the IterateAction watch. We add a closedown action (2260), form the inversion of the xsl:iterate body, and add a new Watch for the CopyOf. This adds another closedown action (2305) and opens the CopyOf watch. We call startSelectedParentNode() on the CopyOfWatch, which returns a CatchingReceiver to be notified of all events for making the copy.

Later we get to the end tag of the item element. The event is notified to the CatchingReceiver, which closes the TinyBuilder constructing the copy. We now pop the ActiveWatchStack to find a list of four closedown actions. The first handles accumulators, which we're not concerned with. The next gets the TinyTree built by the CopyOf watch and emits it to the output pipeline. The third is a close action to close the copy process,. This triggers the execution of the rest of the sequence constructor, causing execution of the xsl:choose and xsl:break instructions. The xsl:break throws a QuitParsingException. WatchManager.endElement() catches this and sets the quit flag.

WatchManager.endElement(), having completed all the closedown actions, now sees the quit flag is set, and calls endElement() twice - once to close the outstanding <items> element and once for the <root> element. Neither of these does anything interesting.

Crucially, at this point there is still an entry on the activeWatchStack, corresponding to the startDocument event, and this links to closedown actions associated with the document node, most notably the writing of the </items> end tag, These are not getting actioned. The 9.9 code actioned them by invoking an extra call on endElement(), but that's spurious: it should really be a call on endDocument.

So I've added a call on endDocument() and the test now passes. The logic seems correct. But three (newish) tests in sf-boolean now fail: -117, -118, and -119. We'll look at them next.

Actions

Copy link

Updated by Michael Kay over 4 years ago

looking at sf-boolean-117, it produces incorrect results. Instead of the expected results

<out>true</out>

we get

<out>true</out>false

This seems to be a problem specific to the BooleanFnFeed (streaming implementation of fn:boolean). It contains the instructions:


                getTerminator().terminate();
                done = true;

but this is the wrong way round; the terminate() call triggers a QuitParsingException, so done is left as false; and the close() call therefore sends the current result (false) down the output pipeline. This was only working before because we failed to send the close() event.

Actions

Copy link

Updated by Michael Kay over 4 years ago

After this change, we have no remaining unexplained failures in the streaming tests.

Having added an endDocument() on the early-exit recovery path for a quit occurring during endElement(), it also makes equal sense to have one for the corresponding path when the quit occurs during other events including startElement, characters(), etc. I confirmed that this causes no regression, though it would be nice to have a case that demonstrates it makes a difference.

Actions

Copy link

Updated by Michael Kay over 4 years ago

Status changed from New to Resolved
Assignee set to Michael Kay
Fix Committed on Branch 10 added

Actions

Copy link

#10

Updated by Michael Kay over 4 years ago

Status changed from Resolved to In Progress

The extra check added to RegularSequenceChecker.close() causes a failure in QT3 test

-s:prod-OptionDecl.serialization -t:Serialization-024

Actions

Copy link

#11

Updated by Michael Kay over 4 years ago

Status changed from In Progress to Resolved

I have enhanced the RegularSequenceChecker to introduce a new state "Failed" which is set if an exception occurs. A call on close() is allowed in Failed state; if close() is called when not in failed state, it checks that all outstanding startDocument/startElement events have been closed by a matching endDocument/endElement event.

Actions

Copy link

#12

Updated by O'Neil Delpratt over 4 years ago

% Done changed from 0 to 100
Fixed in Maintenance Release 10.1 added

Bug fix committed in the Saxon 10.1 maintenance release.

Actions

Copy link

#13

Updated by O'Neil Delpratt over 4 years ago

Status changed from Resolved to Closed

Please register to edit this issue

Actions

Send by e-mail Copy link

Also available in: Atom PDF

Project

Profile

Help

Saxon

Bug #4493

end tag not written when using xsl:iterate/xsl:break with early exit

Updated by Michael Kay over 4 years ago

Updated by Michael Kay over 4 years ago

Updated by Michael Kay over 4 years ago

Updated by Michael Kay over 4 years ago

Updated by Michael Kay over 4 years ago

Updated by Michael Kay over 4 years ago

Updated by Michael Kay over 4 years ago

Updated by Michael Kay over 4 years ago

Updated by Michael Kay over 4 years ago

Updated by Michael Kay over 4 years ago

Updated by Michael Kay over 4 years ago

Updated by O'Neil Delpratt over 4 years ago

Updated by O'Neil Delpratt over 4 years ago