Project

Profile

Help

Bug #2788

closed

xsl:strip-space is not applied to the results of collection()

Added by Michael Kay over 8 years ago. Updated over 8 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
XSLT conformance
Sprint/Milestone:
-
Start date:
2016-06-09
Due date:
% Done:

100%

Estimated time:
Legacy ID:
Applies to branch:
9.7
Fix Committed on Branch:
Fixed in Maintenance Release:
Platforms:

Description

Reported by Eliot Kimber on saxon-help list on SourceForge.

I have code like this:

<xsl:if test="not(position() = last())">

         <xsl:text>,</xsl:text>

       </xsl:if>

When I run this transform under Saxon 9.6.0.7 (the version of 9.6 I had

lying about) I get the expected result from my data set.

However, when I run with 9.7.0.1 or 9.7.0.5 I appear to get the position

and next for the parent of the context element.

The source XML is:

   <ref name="title"/>

   <optional>

     <ref name="titlealts"/>

   </optional>

   <optional>

     <choice>

       <ref name="abstract"/>

       <ref name="shortdesc"/>

     </choice>

   </optional>

   <optional>

     <ref name="prolog"/>

   </optional>

   <optional>

     <ref name="body"/>

   </optional>

   <optional>

     <ref name="related-links"/>

   </optional>

   <zeroOrMore>

     <ref name="mytopic-info-types"/>

   </zeroOrMore>

 </define>

And my template is matching on elements, using the position() =

last() check to determine when to omit commas

When I run with 9.6 I get the correct result, and these position() and

last() values:

[java]  + [DEBUG] element-decls: rng:ref(), name="info-types",

position="1", last="1"

[java]  + [DEBUG] element-decls: rng:ref(), name="titlealts",

position="1", last="1"

[java] + [DEBUG] element-decls: rng:ref(), name="abstract", position="1",

last="1"

[java]  + [DEBUG] element-decls: rng:ref(), name="shortdesc",

position="1", last="1"

[java]  + [DEBUG] element-decls: rng:ref(), name="prolog",

position="1", last="1"

[java]  + [DEBUG] element-decls: rng:ref(), name="body",

position="1", last="1"

[java]  + [DEBUG] element-decls: rng:ref(), name="related-links",

position="1", last="1"

[java]  + [DEBUG] element-decls: rng:ref(),

name="mytopic-info-types", position="1", last="1"

(Although I notice looking at these results that abstract and shortdesc,

which are siblings, do not report the expected position() or last()

values, so something is not right here.

With 9.7 I get these results:

[java]  + [DEBUG] element-decls: rng:ref(), name="info-types",

position="2", last="3"

[java]  + [DEBUG] element-decls: rng:ref(), name="titlealts",

position="2", last="3"

[java]  + [DEBUG] element-decls: rng:ref(), name="abstract",

position="1", last="1"

[java]  + [DEBUG] element-decls: rng:ref(), name="shortdesc",

position="1", last="1"

[java]  + [DEBUG] element-decls: rng:ref(), name="prolog",

position="2", last="3"

[java]  + [DEBUG] element-decls: rng:ref(), name="body",

position="2", last="3"

[java]  + [DEBUG] element-decls: rng:ref(), name="related-links",

position="2", last="3"

[java]  + [DEBUG] element-decls: rng:ref(),

name="mytopic-info-types", position="2", last="3"

For all but abstract and shortdesc, the position() and last() values are

different and also not what was expected.

The transform and source I'm using is available here:

https://github.com/oasis-open/dita-rng-converter/commit/c5dbc3379e0b7b6684e

38fa63439e391fbb34a47

The command line I'm running is:

ant generate-dtd

-DdoctypesDir=test/specializations/org.example/1.3/rng/mytopic

-Doutdir=test/specializations/org.example -DgenerateCatalogs=true

-DgenerateModules=true -Ddebug=false

I think the abstract and shortdesc results are explained by this code:

<xsl:for-each select="rng:*">

 <xsl:if test="not(position()=1)">

   <xsl:text> |&#x0a;</xsl:text>

 </xsl:if>

 <xsl:apply-templates select="." mode="#current" />

</xsl:for-each>

Which is used for choice groups, which abstract and shortdesc are children

of. In that case, each group has size 1 so the 1/1 result makes sense.

But the other results make no sense that I can see and there is definitely

a difference from 9.6 to 9.7.

It also looks like my code works by accident with 9.6 but definitely fails

with 9.7.

Cheers,

Eliot


Eliot Kimber, Owner

Contrext, LLC

http://contrext.com


Files

saxon-issue-2788.zip (11.3 MB) saxon-issue-2788.zip Eliot Kimber, 2016-06-10 16:25
Actions #1

Updated by Eliot Kimber over 8 years ago

The project in this zip file should run the transform against the myTopic.rng file if you simply unzip and run ant in the root directory. I created a single-file version of the transform, xsl/rng2ditadtd_resolved.xsl

The symptom of the position() issue is build/1.3/dtd/myTopic/dtd/myTopic.mod, which shows incorrect commas within the generated model groups, making the DTD syntax invalid.

The Zip includes the Saxon jars for convenience.

Actions #2

Updated by Michael Kay over 8 years ago

I'm running the transformation successfully in the IDE (intelliJ). Currently I'm getting the same output (with incorrectly positioned commas) under both 9.6 and 9.7. I'm running under EE for convenience, I'll try changing some parameters or try running from the command line to see if that changes anything.

Actions #3

Updated by Eliot Kimber over 8 years ago

I had added debugging messages to the transform to report the values for position() and last() and that's where I was seeing differences between 9.6 and 9.7, but I was definitely getting different results from both versions.

It's possible I didn't provide the exact right version of the transform but the behavior definitely demonstrates the problem. I noticed this because the XSLT that had worked before 9.7 was now not working.

Actions #4

Updated by Michael Kay over 8 years ago

If I've got a configuration that demonstrates the bug, that's all I need for the moment. Working out why a different configuration doesn't show the bug is secondary.

Actions #5

Updated by Eliot Kimber over 8 years ago

Based on my original analysis, both 9.6 and 9.7 gave incorrect values for position() and last(), but they gave different incorrect values. The 9.6 incorrect values happened to work with my business logic.

Actions #6

Updated by Michael Kay over 8 years ago

I think the problem is that XSLT-defined space-stripping rules are not being applied to documents read using the collection() function. The stylesheet generally uses xsl:apply-templates with no select attribute, which selects all children including whitespace text nodes, and this means that when processing element nodes, position()=last() is always false.

Actions #7

Updated by Michael Kay over 8 years ago

  • Subject changed from position() and last() differ between 9.6 and 9.7 to xsl:strip-space is not applied to the results of collection()
  • Status changed from New to In Progress

I have added XSLT3 test case collection-005 which confirms that Saxon 9.7 is not applying xsl:strip-space rules to documents read using the collection() function. The specification is clear that this is incorrect.

Actions #8

Updated by Michael Kay over 8 years ago

Solving this is not easy. The machinery for the collection() function is carefully constructed to allow user customisation, and it's not clear where strip-space should fit in. Do we pass the strip-space information to the CollectionFinder to take note of as it chooses? Or do we apply strip-space to whatever it returns? The difficulty with doing the latter is that it is much more efficient to apply space stripping as a filter in the XML parsing pipeline than to retrofit it to an already-built document tree.

An added complication is the XSLT 3.0 rule that the space-stripping rules can vary from one package to another, so we need to know which package contains the collection() call. (But we don't yet do this for the doc() and document() functions...)

And then there's a problem with testing, because the XSLT3 test driver has its own machinery for defining collections which is not necessarily the same as the machinery used in other applications.

And we have to try and preserve API compatibility if we can, though the CollectionFinder apparatus is still fairly new so we can make changes if there's no alternative.

Actions #9

Updated by Michael Kay over 8 years ago

Oh dear: getting this working in the test driver is particularly difficult because it does all the work of building a collection up front, long before the collection() function is called. This might need a complete redesign of the way collections work in the test driver. Which is largely a distraction from getting the bug fixed in the actual product.

Actions #10

Updated by Eliot Kimber over 8 years ago

At least you identified the root cause--it seemed unlikely that there was actually a bug in calculating position().

I see the strip-space problem. If I was implementing my own collection handler I would probably be surprised to learn I had to do my own space stripping but on the other hand I would probably also be upset if there was a performance hit because of post-processing to remove spaces.

I would think a strong warning in the customization documentation and a readily-available SAX space filter would be appropriate.

Actions #11

Updated by Michael Kay over 8 years ago

I've added code to CollectionFn which in effect post-processes the results of the user-written (or system default) CollectionFinder to do the space stripping from returned documents. (It does this by wrapping the returned tree in a wrapper that performs the space stripping while the tree is being navigated).

This meets the conformance rules but it would be nice to add an optimization whereby the CollectionFinder can elect to take responsibility for space stripping itself. (It then has the option to be non-conformant if it chooses; so long as it declares that it's done all that needs to be done, Saxon will ask no questions.)

Actions #12

Updated by Eliot Kimber over 8 years ago

Seems reasonable.

Actions #13

Updated by Michael Kay over 8 years ago

I've added a method to AbstractResourceCollection (perhaps it can be promoted to the ResourceCollection interface in 9.8) whereby (a) Saxon tells the ResourceCollection what the whitespace stripping rules are, and (b) the ResourceCollection responds saying whether it agrees to take responsibility for implementing them (default is NO). Next step is that standard collection implementations like the DirectoryCollection will use this information to do whitespace stripping during parsing, and tell the CollectionFn code that they have taken care of it.

Actions #14

Updated by Michael Kay over 8 years ago

  • Status changed from In Progress to Resolved

Patch applied to the 9.7 and 9.8 branches.

As well as the changes previously described, the various subclasses of AbstractResourceCollection now act upon the supplied whitespace stripping rules. The ParseOptions object is extended to contain the space stripping rules, and the Sender class is extended to use these rules if supplied by inserting a Stripper into the parsing pipeline. (On reflection, we could have used the general FilterFactory mechanism.)

In the 9.8 version of the patch, the new method on AbstractResourceCollection to set the whitespace stripping rules is promoted to the ResourceCollection interface, and is also implemented in the ResourceCollection that wraps an old-style CollectionUriResolver.

Actions #15

Updated by Michael Kay over 8 years ago

I have extended the patch so that it now implements the XSLT 3.0 rule that allows different whitespace stripping rules to apply to the same collection in different packages

Actions #16

Updated by O'Neil Delpratt over 8 years ago

  • Status changed from Resolved to Closed
  • % Done changed from 0 to 100
  • Fixed in Maintenance Release 9.7.0.6 added

Bug fixed in maintenance release 9.7.0.6.

Please register to edit this issue

Also available in: Atom PDF