Project

Profile

Help

Bug #3122

closed

last() not computed correctly when processing sequence constructed from variable

Added by Martin Honnen about 7 years ago. Updated about 7 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
XSLT conformance
Sprint/Milestone:
-
Start date:
2017-02-02
Due date:
% Done:

100%

Estimated time:
Legacy ID:
Applies to branch:
9.7
Fix Committed on Branch:
9.7
Fixed in Maintenance Release:
Platforms:

Description

I have run into a problem with Saxon 9.7 (occurs with both Saxon-EE 9.7.0.14J and Saxon-HE 9.7.0.14J) where the value of last() when processing a sequence constructed in a variable as element()* with <xsl:apply-templates select="$var"/> is wrong. Here is a diagnostic test case:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
	xmlns:xs="http://www.w3.org/2001/XMLSchema"
	exclude-result-prefixes="xs"
	version="2.0">
	
	<xsl:output method="text"/> 
	
	<xsl:template match="/"> 
		<xsl:variable name="duplicates" as="element()*">      
			<xsl:for-each-group select="/root/items/item" group-by="@key">
				<xsl:sequence select="current-group()[current-group()[2]]"/>
			</xsl:for-each-group>
		</xsl:variable>
		<xsl:apply-templates select="$duplicates"/>
	</xsl:template>
	
	<xsl:template match="item">
		<xsl:message select="'position(): ', position(), '; last(): ', last()"/>
		{
		
		"value" : "<xsl:value-of select="@value"/>"
		
		}<xsl:if test="position() &lt; last()">,</xsl:if>
	</xsl:template>
	
</xsl:stylesheet>

When I run that against the input

<?xml version="1.0" encoding="UTF-8"?>
<root>
	<items>
		<item key="a" value="a1"/>
		<item key="b" value="b1"/>
		<item key="a" value="a2"/>
		<item key="c" value="c1"/>
		<item key="a" value="a3"/>
		<item key="c" value="c2"/>
	</items>
</root>

the output is

position():  1 ; last():  9
position():  2 ; last():  9
position():  3 ; last():  9
position():  4 ; last():  9
position():  5 ; last():  9

                {

                "value" : "a1"

                },
                {

                "value" : "a2"

                },
                {

                "value" : "a3"

                },
                {

                "value" : "c1"

                },
                {

                "value" : "c2"

                },

So although only (and correctly) 5 item elements are processed, last() is computed as 9@. And that way the @5th item in the output has a unwanted comma appended.

The problem does not occur with Saxon 9.6 (tested with Saxon-EE 9.6.0.9J).

Actions #1

Updated by Michael Kay about 7 years ago

  • Assignee set to Michael Kay
  • Priority changed from Low to Normal

I have added this as XSLT3 test case for-each-group-080. I have reproduced the failure on 9.7, but it is working fine on 9.8.

Because the variable containing the for-each-group instruction is inlined, we are evaluating the for-each-group in pull mode (that is, as an iterator), which is a little unusual. Furthermore, the call on last() is implemented in this case by calling getAnother() on this iterator. The implementation of last() in such situations is to evaluate the underlying construct twice, once to determine how many items there are, and then subsequently to get the items.

This has been the design since time immemorial, but for Saxon-JS I introduced a new design which has been retrofitted to 9.8: instead of getting a new iterator and counting items, the FocusTrackingIterator reads the remaining items in the list and buffers them. This turns out to be an awful lot simpler; there's a time/memory trade-off but there's no way of knowing whether the new trade-off is better or worse than the old on average.

I haven't found out exactly what's going wrong, but I suspect that when we get the new iterator using getAnother(), it's not completely clean in the case where grouping iterators are involved.

Implementing the new design for last() on the 9.7 branch seems a bit radical and destabilizing; I think we should try to fix the code rather than re-designing it, but it may prove difficult.

Actions #2

Updated by Michael Kay about 7 years ago

The reason it works in 9.6 is that the variable isn't inlined. You would probably get the same bug if the for-each-group were placed in a function rather than a variable.

Actions #3

Updated by Michael Kay about 7 years ago

Adding diagnostic trace to look at the items read off and counted by last(), we see

0: <item key="a" value="a1"/>
1: <item key="a" value="a2"/>
2: <item key="a" value="a3"/>
3: <item key="a" value="a1"/>
4: <item key="a" value="a2"/>
5: <item key="a" value="a3"/>
6: <item key="a" value="a1"/>
7: <item key="a" value="a2"/>
8: <item key="a" value="a3"/>

So it's clear that for some reason current-group() is always the first group, thus accounting for the count of 9.

Actions #4

Updated by Michael Kay about 7 years ago

Some further tracing shows that we are calling currentGroup() on a different GroupByIterator than the one we are advancing using next():

new GroupByIterator net.sf.saxon.expr.sort.GroupByIterator@2a098129
GroupByIterator.next() net.sf.saxon.expr.sort.GroupByIterator@2a098129 position=0
GroupByIterator.iterateCurrentGroup() net.sf.saxon.expr.sort.GroupByIterator@2a098129 position=1
GroupByIterator.iterateCurrentGroup() net.sf.saxon.expr.sort.GroupByIterator@2a098129 position=1
new GroupByIterator net.sf.saxon.expr.sort.GroupByIterator@198e2867
GroupByIterator.next() net.sf.saxon.expr.sort.GroupByIterator@198e2867 position=0
GroupByIterator.iterateCurrentGroup() net.sf.saxon.expr.sort.GroupByIterator@2a098129 position=1
GroupByIterator.iterateCurrentGroup() net.sf.saxon.expr.sort.GroupByIterator@2a098129 position=1
0: <item key="a" value="a1"/>
1: <item key="a" value="a2"/>
2: <item key="a" value="a3"/>
GroupByIterator.next() net.sf.saxon.expr.sort.GroupByIterator@198e2867 position=1
GroupByIterator.iterateCurrentGroup() net.sf.saxon.expr.sort.GroupByIterator@2a098129 position=1
GroupByIterator.iterateCurrentGroup() net.sf.saxon.expr.sort.GroupByIterator@2a098129 position=1
3: <item key="a" value="a1"/>
4: <item key="a" value="a2"/>
5: <item key="a" value="a3"/>

I have tried moving the GroupByIterator from XPathContextMajor to XPathContextMinor so that ContextMappingIterator can put the new grouping iterator in the next context, but this isn't working. The problem is that when we do GroupByIterator.getAnother(), we don't have a context in hand, so we can't make it the current grouping iterator in that context; and in ContextMappingIterator.getAnother(), where we create the new context, we don't know that we've created a new GroupByIterator, because it's buried deeply within a stack of other iterators.

Actions #5

Updated by Michael Kay about 7 years ago

I have experimentally changed 9.7 to use the new 9.8 design for computing last(). Most things are working fine, but test position-0102 is failing because of bug #3124.

Actions #6

Updated by Michael Kay about 7 years ago

  • Status changed from New to Resolved
  • Fix Committed on Branch 9.7 added

I have decided to implement this fix. The bug regarding position-0102 is a separate issue.

Actions #7

Updated by Michael Kay about 7 years ago

  • Status changed from Resolved to In Progress

The patch seems to have adverse consequences on anaylze-string when last() is used - see test analyze-string-033. The problem is that analyze-string wraps a FocusTrackingIterator around a RegexIterator, and while it reads off the FocusTrackingIterator, it accesses the underlying RegexIterator to see if it's positioned on a match or a non-match. The call to last() gets the two iterators out of sync.

And I suspect there may be a similar problem with for-each-group, where we wrap a FocusTrackingIterator around a GroupIterator and expect the two to stay in sync.

I have fixed this by making (all the implementations of) RegexIterator and GroupIterator implement the LastPositionFinder method getLength(); this means that the FocusTrackingIterator, when last() is called, will delegate the call to the underlying iterator rather than consuming it. This is fairly straightforward to do in most cases (we just get a new iterator and count its items, which is essentially the old logic). It's a bit trickier to do in 9.8 for the positional grouping iterators, where we no longer have a getAnother() method.

Actions #8

Updated by Michael Kay about 7 years ago

  • Status changed from In Progress to Resolved

A further patch has been applied to fix these issues.

Actions #9

Updated by O'Neil Delpratt about 7 years ago

  • Status changed from Resolved to Closed
  • % Done changed from 0 to 100
  • Fixed in Maintenance Release 9.7.0.15 added

Bug fix applied to the Saxon 9.7.0.15 maintenance release

Please register to edit this issue

Also available in: Atom PDF