Project

Profile

Help

Bug #5368

closed

In XSLT patterns, (A/(B except C)) is incorrectly rewritten as (A/B except A/C)

Added by Michael Kay about 2 years ago. Updated about 2 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
XSLT conformance
Sprint/Milestone:
-
Start date:
2022-03-03
Due date:
% Done:

100%

Estimated time:
Legacy ID:
Applies to branch:
10, 11, trunk
Fix Committed on Branch:
10, 11, trunk
Fixed in Maintenance Release:
Platforms:
.NET, Java

Description

The new test case match-273 demonstrates an example where this rewrite is incorrect. The specific pattern is

x/(descendant::a except child::a)

and the problem arises when there is an a element that is a descendant of one x element and a child of another. The original pattern should match such an element, but the rewritten pattern should not.

Also affects SaxonJS

Actions #1

Updated by Michael Kay about 2 years ago

I think that we're getting it wrong even for much simpler patterns of the form descendant::x except child::x. We're interpreting this to mean "match elements that match descendant::x and that don't match child::x" - which won't match anything. But the correct semantics are "match any x element E provided there is an ancestor A such that E is a descendant of A and is not a child of A". This will match any x element that has a grandparent.

Actions #2

Updated by Michael Kay about 2 years ago

So what about intersect?

Consider div[title='John']//para intersect div[title='Jane']//para .

The rule says that it matches a para $N that is present in the result of the expression

root($N)/descendant-or-self::node()/(child-or-top::div[title='John']//para intersect child-or-top::div[title='Jane']//para)

For a para to be present in the intersection, it must have a containing div with title="John" and a containing div with title="Jane", and these must be children of the same parent, which means in practice they must be the same div element, since the subtrees of two sibling elements don't intersect.

We're currently treating this pattern as matching any node that has an ancestor div with title="John" and an ancestor div with title="Jane", regardless of the relationship of these two ancestors.

Actions #3

Updated by Michael Kay about 2 years ago

So, are there special cases where we can treat X except Y as meaning "matches X and does not match Y"?

An obvious candidate is @* except @code. Similarly, I think * except title is safe. These are the cases that are likely to arise in practice, and it's important that they perform well. So I think I'll keep the rewrite for simple cases where both operands are axis expressions using the attribute or child axes. "intersect" is less likely to occur but we might as well apply the same rules. It should handle multiple operands, e.g. @* except (@code | @status) or @* except @code except @status.

But in fact we have another optimization for such expressions: we generate a NodeTest with a composite condition, for example child::(author|editor). So we'll have to see how these rewrites interact.

Actions #4

Updated by Vladimir Nesterovsky about 2 years ago

Can you please point to the spec on how intersect and except patterns should work?

I can see the grammar IntersectExceptExprP, but cannot find interpretation.

Actions #5

Updated by Michael Kay about 2 years ago

ยง5.5.3,

Specifically, an item N matches a pattern P if the following applies, where EE is the equivalent expression to P:

N is a node, and the result of evaluating the expression root(.)//(EE) with a singleton focus based on N is a sequence that includes the node N

Actions #6

Updated by Michael Kay about 2 years ago

Another case where we can use the current IntersectPattern and ExceptPattern logic is if one or both of the operands of except/intersect is independent of the context item: for example para except $specialParas. This includes patterns starting //, for example para except //appendix//para.

But not para except appendix//para, counter-intuitively.

Actions #7

Updated by Michael Kay about 2 years ago

  • Status changed from New to Resolved
  • Applies to branch 10, 11, trunk added
  • Fix Committed on Branch 10, 11, trunk added
Actions #8

Updated by Michael Kay about 2 years ago

More tests have been created in the xslt3 match test set, and further code changes have been applied to make sure they pass.

Actions #9

Updated by O'Neil Delpratt about 2 years ago

  • % Done changed from 0 to 100
  • Fixed in Maintenance Release 10.8 added

Bug fix applied in the Saxon 10.8 maintenance release. (Leaving open awaiting Saxon 11 maintenance release.)

Actions #10

Updated by O'Neil Delpratt about 2 years ago

  • Status changed from Resolved to Closed
  • Fixed in Maintenance Release 11.3 added
  • Fixed in Maintenance Release deleted (10.8)

Bug fix applied in the Saxon 11.3 maintenance release.

Actions #11

Updated by O'Neil Delpratt about 2 years ago

  • Fixed in Maintenance Release 10.8 added

Please register to edit this issue

Also available in: Atom PDF