Project

Profile

Help

Bug #4493

end tag not written when using xsl:iterate/xsl:break with early exit

Added by Martin Honnen 5 months ago. Updated 3 months ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
Streaming
Sprint/Milestone:
-
Start date:
2020-03-22
Due date:
% Done:

100%

Estimated time:
Legacy ID:
Applies to branch:
10
Fix Committed on Branch:
10
Fixed in Maintenance Release:

Description

I have a stylesheet using streaming and xsl:iterate with xsl:break doing some early exit, in Saxon 9.9 the result is well-formed document but with 10 the end tag of the root element (created as a literal result element) is missing.

XSLT:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:saxon="http://saxon.sf.net/"
    exclude-result-prefixes="#all" version="3.0">
    
    <xsl:output indent="yes"/>
    
    <xsl:mode streamable="yes" on-no-match="shallow-skip"/>
    
    <xsl:template match="/">
        <items>
            <xsl:iterate select="root/items/item">
                <xsl:copy-of select=".[@cat != 'bar']"/>
                <xsl:if test="@cat = 'bar'">
                    <xsl:break/>
                </xsl:if>
            </xsl:iterate>
        </items>
    </xsl:template>
    
</xsl:stylesheet>

XML:

<?xml version="1.0" encoding="UTF-8"?>
<root>
    <items>
        <item cat="foo">
            <value>item 1</value>
        </item>        
        <item cat="foo">
            <value>item 2</value>
        </item>
        <item cat="foo">
            <value>item 3</value>
        </item>
        <item cat="foo">
            <value>item 4</value>
        </item>
        <item cat="foo">
            <value>item 5</value>
        </item>
        <item cat="bar">
            <value>item 6</value>
        </item>
        <item cat="bar">
            <value>item 7</value>
        </item>
        <item cat="bar">
            <value>item 8</value>
        </item>
        <item cat="bar">
            <value>item 9</value>
        </item>
    </items>
</root>

Output with Saxon-EE 10.0J:

<?xml version="1.0" encoding="UTF-8"?>
<items>
   <item cat="foo">
      <value>item 1</value>
   </item>
   <item cat="foo">
      <value>item 2</value>
   </item>
   <item cat="foo">
      <value>item 3</value>
   </item>
   <item cat="foo">
      <value>item 4</value>
   </item>
   <item cat="foo">
      <value>item 5</value>
   </item>

Output with Saxon-EE 9.9.1.7J:

<?xml version="1.0" encoding="UTF-8"?>
<items>
   <item cat="foo">
      <value>item 1</value>
   </item>
   <item cat="foo">
      <value>item 2</value>
   </item>
   <item cat="foo">
      <value>item 3</value>
   </item>
   <item cat="foo">
      <value>item 4</value>
   </item>
   <item cat="foo">
      <value>item 5</value>
   </item>
</items>
input1.xml (762 Bytes) input1.xml Martin Honnen, 2020-03-22 14:01
iterate-break2.xsl (686 Bytes) iterate-break2.xsl Martin Honnen, 2020-03-22 14:01

History

#1 Updated by Michael Kay 5 months ago

Thanks for reporting it. The logic for deciding how many endElement() calls to issue on early exit during streaming has indeed changed in 10.0.

#2 Updated by Michael Kay 5 months ago

The test (added as si-iterate-140) is working when run in the test driver, but failing when run from the command line.

The difference is probably that in the test driver, we are writing the result to a document builder, while from the command line, we are writing to a serializer; the document builder can probably recover from a missing endElement() event, whereas in the serializer, this will result in a missing end tag.

#3 Updated by Michael Kay 5 months ago

In 9.9 the code in WatchManager.endElement() (on catching a QuitParsingException) did:

            while (!activeWatchStack.isEmpty()) {
                endElement();
            }

In 10.0 it does

           while (!namespaceStack.isEmpty()) {
                endElement();
            }

The difference is that activeWatchStack has one more entry (an entry is created for the startDocument/endDocument event pair, as well as for startElement/endElement).

Simply reverting the change causes a failure with an EmptyStackException; the change must have been made for a reason.

The change was made in commit 8806 on 2019-02-06, and it was part of the general change to the handling of namespaces on the Receiver pipeline; therefore not a change that was specifically concerned with streaming, and therefore it's likely that the effect on streaming wasn't thoroughly explored at the time. The fact that we run nearly all our streaming tests to a document builder rather that to a serializer might well have masked the problem.

#4 Updated by Michael Kay 5 months ago

I can fix the problem by reverting to using the activeWatchStack as the control for how many endElement() calls to issue, and then defending against the EmptyStackException by testing for namespaceStack.isEmpty() before doing the pop().

This now raises the question of testing. How should we ensure that a problem like this doesn't happen again (or indeed, that it isn't present in other test cases?

I've been running the tests with Java assertions enabled, and in that situation we actually have a RegularSequenceChecker on the pipeline, which ought to detect if there is a missing end tag. But it seems that we're allowing a close() event to occur in any state, even with unclosed element tags -- and that's probably necessary for error scenarios. I shall try changing that.

#5 Updated by Michael Kay 5 months ago

The extra check inserted into RegularSequenceChecker.close() seems to cause no problems. However, the change to WatchManager.endElement() to fix the original bug causes regression to a number of other tests.

I'll focus on one of these: sf-boolean-117. This test, instead of the expected result <out>true</out>, is outputting <out>true</out>false. If I revert the fix, this test runs OK. Note: many of the tests that are now failing are tests that construct new unstreamed nodes on the streamed pipeline, which is a situation that wasn't working at all in 9.9.

For the moment, I'm reverting the patch. I think the whole issue of tidying up after an early exit needs some careful design thought, rather than just hacking the code until the tests all pass.

#6 Updated by Michael Kay 5 months ago

Let's look closely at the failing test.

  <xsl:template match="/">
    <items>
      <xsl:iterate select="root/items/item">
        <xsl:copy-of select=".[@cat != ('foo','bar')]"/>
        <xsl:if test="@cat = 'bar'">
          <xsl:break/>
        </xsl:if>
      </xsl:iterate>
    </items>
  </xsl:template>

In the deferredStartDocument() call we identify the Watch for firing the applyTemplates action. The startSelectedParentNode() method (with the document node as argument) finds the template rule, forms its inversion, and adds a new Watch for this rule. We add a closedown action for this watch (2147), and open the watch. This calls IterateAction.open(), which sends the open() call up the pipeline, causing the start tag to be emitted,

Later we get to the start tag for an item element. This triggers the IterateAction watch. We add a closedown action (2260), form the inversion of the xsl:iterate body, and add a new Watch for the CopyOf. This adds another closedown action (2305) and opens the CopyOf watch. We call startSelectedParentNode() on the CopyOfWatch, which returns a CatchingReceiver to be notified of all events for making the copy.

Later we get to the end tag of the item element. The event is notified to the CatchingReceiver, which closes the TinyBuilder constructing the copy. We now pop the ActiveWatchStack to find a list of four closedown actions. The first handles accumulators, which we're not concerned with. The next gets the TinyTree built by the CopyOf watch and emits it to the output pipeline. The third is a close action to close the copy process,. This triggers the execution of the rest of the sequence constructor, causing execution of the xsl:choose and xsl:break instructions. The xsl:break throws a QuitParsingException. WatchManager.endElement() catches this and sets the quit flag.

WatchManager.endElement(), having completed all the closedown actions, now sees the quit flag is set, and calls endElement() twice - once to close the outstanding <items> element and once for the <root> element. Neither of these does anything interesting.

Crucially, at this point there is still an entry on the activeWatchStack, corresponding to the startDocument event, and this links to closedown actions associated with the document node, most notably the writing of the </items> end tag, These are not getting actioned. The 9.9 code actioned them by invoking an extra call on endElement(), but that's spurious: it should really be a call on endDocument.

So I've added a call on endDocument() and the test now passes. The logic seems correct. But three (newish) tests in sf-boolean now fail: -117, -118, and -119. We'll look at them next.

#7 Updated by Michael Kay 5 months ago

looking at sf-boolean-117, it produces incorrect results. Instead of the expected results

<out>true</out>

we get

<out>true</out>false

This seems to be a problem specific to the BooleanFnFeed (streaming implementation of fn:boolean). It contains the instructions:


                getTerminator().terminate();
                done = true;

but this is the wrong way round; the terminate() call triggers a QuitParsingException, so done is left as false; and the close() call therefore sends the current result (false) down the output pipeline. This was only working before because we failed to send the close() event.

#8 Updated by Michael Kay 5 months ago

After this change, we have no remaining unexplained failures in the streaming tests.

Having added an endDocument() on the early-exit recovery path for a quit occurring during endElement(), it also makes equal sense to have one for the corresponding path when the quit occurs during other events including startElement, characters(), etc. I confirmed that this causes no regression, though it would be nice to have a case that demonstrates it makes a difference.

#9 Updated by Michael Kay 5 months ago

  • Status changed from New to Resolved
  • Assignee set to Michael Kay
  • Fix Committed on Branch 10 added

#10 Updated by Michael Kay 5 months ago

  • Status changed from Resolved to In Progress

The extra check added to RegularSequenceChecker.close() causes a failure in QT3 test

-s:prod-OptionDecl.serialization -t:Serialization-024

#11 Updated by Michael Kay 5 months ago

  • Status changed from In Progress to Resolved

I have enhanced the RegularSequenceChecker to introduce a new state "Failed" which is set if an exception occurs. A call on close() is allowed in Failed state; if close() is called when not in failed state, it checks that all outstanding startDocument/startElement events have been closed by a matching endDocument/endElement event.

#12 Updated by O'Neil Delpratt 3 months ago

  • % Done changed from 0 to 100
  • Fixed in Maintenance Release 10.1 added

Bug fix committed in the Saxon 10.1 maintenance release.

#13 Updated by O'Neil Delpratt 3 months ago

  • Status changed from Resolved to Closed

Please register to edit this issue

Also available in: Atom PDF