Project

Profile

Help

Bug #2714

Streamed grouping using xsl:for-each-group group-starting-with and snapshot(current-group())/.. gives different result than unstreamed grouping

Added by Martin Honnen over 4 years ago. Updated about 3 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
Streaming
Sprint/Milestone:
Start date:
2016-04-16
Due date:
% Done:

100%

Estimated time:
Legacy ID:
Applies to branch:
9.6, 9.7, trunk
Fix Committed on Branch:
9.6, 9.7, trunk
Fixed in Maintenance Release:

Description

I have the following XSLT:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
	xmlns:xs="http://www.w3.org/2001/XMLSchema"
	xmlns:math="http://www.w3.org/2005/xpath-functions/math" exclude-result-prefixes="xs math"
	version="3.0">
	
	<xsl:mode streamable="yes"/>
	
	<xsl:output indent="yes"/>
	
	<xsl:template match="note">
		<xsl:copy>
			<xsl:for-each-group select="*/text()" group-starting-with="text()[ends-with(., ':')]">
				<group>
					<xsl:copy-of select="snapshot(current-group())/.."/>
				</group>
			</xsl:for-each-group>
		</xsl:copy>
	</xsl:template>
	
</xsl:stylesheet>

When run with Saxon 9.7.0.4 EE against the input sample

<?xml version="1.0" encoding="UTF-8"?>
<note>
	<para>customer name :</para>
	<para>mr. Joe Someone</para>
	<para>calling from :</para>
	<para>1234567</para>
	<para>device model :</para>
	<para>ABC-123</para>
	<para>issue:</para>
	<para>some info</para>
	<para>some more info</para>
	<para>and even more info</para>
	<para>solution :</para>
	<para>some solutions</para>
	<para>and some more solutions</para>
</note>

Saxon compiles and runs the XSLT but produces the following result:

<?xml version="1.0" encoding="UTF-8"?>
<note>
   <group>
      <para>mr. Joe Someone</para>
      <para>calling from :</para>
   </group>
   <group>
      <para>1234567</para>
      <para>device model :</para>
   </group>
   <group>
      <para>ABC-123</para>
      <para>issue:</para>
   </group>
   <group>
      <para>some info</para>
      <para>some more info</para>
      <para>and even more info</para>
      <para>solution :</para>
   </group>
   <group>
      <para>some solutions</para>
      <para>and some more solutions</para>
      <para>
</para>
   </group>
</note>

So some grouping has happened but the group contents in the result groups seems rather messed up. For comparison, the same code minus the <xsl:mode streamable="yes"/> produces the wanted result:

<?xml version="1.0" encoding="UTF-8"?>
<note>
   <group>
      <para>customer name :</para>
      <para>mr. Joe Someone</para>
   </group>
   <group>
      <para>calling from :</para>
      <para>1234567</para>
   </group>
   <group>
      <para>device model :</para>
      <para>ABC-123</para>
   </group>
   <group>
      <para>issue:</para>
      <para>some info</para>
      <para>some more info</para>
      <para>and even more info</para>
   </group>
   <group>
      <para>solution :</para>
      <para>some solutions</para>
      <para>and some more solutions</para>
   </group>
</note>

This is also the result I get with Exselt for both versions of the XSLT (with <xsl:mode streamable="yes"/> and without it).

History

#1 Updated by Michael Kay over 4 years ago

  • Category set to Streaming
  • Status changed from New to In Progress
  • Assignee set to Michael Kay
  • Priority changed from Low to Normal
  • Applies to branch 9.7 added

Thanks. I have reproduced the results and have added the test case to the XSLT 3.0 test suite as test si-group-033.

#2 Updated by Michael Kay over 4 years ago

What seems to be happening (at first glance) is that the DocumentSorterAdjunct, which is responsible primarily for eliminating duplicate ancestor nodes from the expression snapshot(current-group())/.., is doing some buffering of nodes, and this is somehow causing the first node of each group to be either dropped (in the case of the first group) or treated as the last node in the previous group (in other cases).

I guess if we were smarter we would notice that snapshot(...)/.. can never contain duplicate nodes because each snapshot is a distinct tree.

#3 Updated by Michael Kay over 4 years ago

  • Status changed from In Progress to Resolved
  • Found in version set to 9.7
  • Applies to branch 9.6, 9.8 added
  • Fix Committed on Branch 9.6, 9.7, 9.8 added

The problem is caused by the fact that WatchManager.characters() when it creates a streamed text node isn't taking a copy of the CharSequence passed to it. This CharSequence is a CharSlice object representing a buffer containing accumulated characters() events from the SAX parser, and the buffer gets reused for subsequent text nodes. Because the process for eliminating duplicates from the parent axis involves a one-item lookahead, by the time the text node is copied to the output, the content of the character buffer has been overwritten with the content of the next text node.

The solution is a simple patch to WatchManager.characters() to do a "toString()" on the supplied CharSequence when creating the streamed text node.

Patch committed on the 9.6, 9.7, and 9.8 branches.

#4 Updated by O'Neil Delpratt over 4 years ago

  • % Done changed from 0 to 100
  • Fixed in Maintenance Release 9.7.0.5 added

Bug fix applied in the 9.7.0.5 maintenance release. Leaving this bug open until fix is applied in the 9.6 maintenance release.

#5 Updated by O'Neil Delpratt over 4 years ago

  • Sprint/Milestone set to 9.7.0.5

#6 Updated by O'Neil Delpratt about 4 years ago

  • Status changed from Resolved to Closed
  • Fixed in Maintenance Release 9.6.0.9 added
  • Fixed in Maintenance Release deleted (9.7.0.5)

Bug fix applied in the Saxon 9.6.0.9 maintenance release.

#7 Updated by O'Neil Delpratt about 4 years ago

  • Fixed in Maintenance Release 9.7.0.5 added

#8 Updated by O'Neil Delpratt about 3 years ago

  • Applies to branch trunk added
  • Applies to branch deleted (9.8)

#9 Updated by O'Neil Delpratt about 3 years ago

  • Fix Committed on Branch trunk added
  • Fix Committed on Branch deleted (9.8)

Please register to edit this issue

Also available in: Atom PDF