Project

Profile

Help

Error during transformation in Saxon 9.7

Added by Vladimir Nesterovsky almost 2 years ago

If I were writing the spec and wanted to allow such optimization then I'd changed rules a little to treat a failed filter expression as false.

I agree with you. I felt at the time that the rules allowed optimizers too much freedom. The pressure to make the rules liberal came from people implementing XQuery over relational databases, where finding the right index to support a query can make it the query run a million times faster. For a long time I resisted taking advantage of this freedom. But we found some use cases last year, involving widely used stylesheets, where the performance benefit from re-arranging predicates was so great that we decided to go this way.


Replies (6)

Please register to reply

RE: Error during transformation in Saxon 9.7 - Added by Michael Kay almost 2 years ago

The XPath specification (section 2.3.4, Errors and Optimization) explicitly allows the predicates of a filter expression to be reordered by an optimizer. See this example, which is very similar to yours:

The expression in the following example cannot raise a casting error if it is evaluated exactly as written (i.e., left to right). Since neither predicate depends on the context position, an implementation might choose to reorder the predicates to achieve better performance (for example, by taking advantage of an index). This reordering could cause the expression to raise an error.

$N[@x castable as xs:date][xs:date(@x) gt xs:date("2000-01-01")]

Saxon has changed in 9.7 to take more advantage of this freedom. We have found cases (particularly in the case of XSLT match patterns) where reordering predicates can result in dramatic performance improvements.

I'm not personally convinced that the spec here has got it right, but it is very explicit on the point.

To prevent this casting error, you need to write something like

<xsl:sequence select="$elements[if (self::d or self::e) then xs:integer(@value) = 1 else false()]"/>

The next question is, why has Saxon reordered the predicates in this particular case? In general, it does so if the estimated cost of the first predicate is more than twice the estimated cost of the second. The estimated costs for the two predicates in this case are 13 and 4 respectively. The cost calculations are crude, and in this particular case I think they could be improved. The cost of the predicate [self::d or self::e] is being overestimated because it is parsed into [exists(self::d) or exists(self::e)], which includes two function calls, and function calls are assumed to be expensive. But the call on exists() is actually very cheap - especially after the change resulting from https://saxonica.plan.io/issues/2565 (!). So we could certainly refine the optimization rules here; but that does not make Saxon's current behaviour incorrect.

RE: Error during transformation in Saxon 9.7 - Added by Michael Kay almost 2 years ago

For the next maintenance release, I have made some refinements to the cost calculations to give a more accurate answer (and thus prevent the predicate reordering) in this particular case.

RE: Error during transformation in Saxon 9.7 - Added by Vladimir Nesterovsky almost 2 years ago

Michael Kay wrote:

The XPath specification (section 2.3.4, Errors and Optimization) explicitly allows the predicates of a filter expression to be reordered by an optimizer. See this example, which is very similar to yours:

I was expecting this will be the answer. I've just failed to find a reference in the spec.

In my opinion such ramifications make hard to reason about and to teach xpath.
Even though I know now that these two expressions can produce different results:

a) $elements[self::d or self::e][xs:integer(@value) = 1];
b) $elements[if (self::d or self::e) then xs:integer(@value) = 1 else false()];

I doubt many people will spot the difference immediately, and I doubt I shall recall the difference in half a year.

If I were writing the spec and wanted to allow such optimization then I'd changed rules a little to treat a failed filter expression as false. In fact something similar already exists with templates where failed evaluation of pattern is treated as un-match.

RE: Error during transformation in Saxon 9.7 - Added by Vladimir Nesterovsky almost 2 years ago

Well, I can somehow accept spec's arguments, and allow the following to raise an error:
$elements[self::d or self::e][xs:integer(@value) = 1];

But now, I'm looking at a code:

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
  xmlns:xs="http://www.w3.org/2001/XMLSchema">

  <xsl:variable name="elements" as="element()+"><a/><b value="c"/></xsl:variable>

  <xsl:template match="/">
    <xsl:variable name="a" as="element()*" select="$elements[self::d or self::e]"/>
    <xsl:variable name="b" as="element()*" select="$a[xs:integer(@value) = 1]"/>

    <xsl:sequence select="$b"/>
  </xsl:template>

</xsl:stylesheet>

and is getting the very same error:

Error at char 20 in xsl:variable/@select on line 8 column 81 of Saxon9.7-filter_speculation.xslt:
  FORG0001: Cannot convert string "c" to an integer

I don't think an xslt developer will accept arguments about internal machinery here.

P.S. My problem is that I should build a rule for a code review to fix the original problem, and now I see that the problem is wider?

RE: Error during transformation in Saxon 9.7 - Added by Vladimir Nesterovsky almost 2 years ago

We have found cases (particularly in the case of XSLT match patterns) where reordering predicates can result in dramatic performance improvements.

In context of match patterns this reordering is perfectly harmless, as according to XSLT 2.0 spec:

5.5.4 Errors in Patterns

Any dynamic error or type error that occurs during the evaluation of a pattern against a particular node is treated as a recoverable error even if the error would not be recoverable under other circumstances. The optional recovery action is to treat the pattern as not matching that node.

RE: Error during transformation in Saxon 9.7 - Added by Michael Kay almost 2 years ago

I think your argument that the reordering is inappropriate when the expression is written using variables is very powerful. I shall raise the question with my WG colleagues.

    (1-6/6)

    Please register to reply