Project

Profile

Help

Bug #4385

closed

Infinite recursion in optimizer

Added by Michael Kay over 4 years ago. Updated over 4 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
Internals
Sprint/Milestone:
-
Start date:
2019-11-14
Due date:
% Done:

0%

Estimated time:
Legacy ID:
Applies to branch:
9.8, 9.9, trunk
Fix Committed on Branch:
9.9, trunk
Fixed in Maintenance Release:
Platforms:

Description

Test case submitted privately by email (code is to be kept confidential).

Actions #1

Updated by Michael Kay over 4 years ago

  • Status changed from New to AwaitingInfo

The stylesheet supplied isn't well-formed, so we have asked for a revised one that enables us to repro the problem.

Actions #2

Updated by Michael Kay over 4 years ago

The critical part of the stack trace is

at com.saxonica.ee.optim.OptimizerEE.reorderPredicates(OptimizerEE.java:560)

at net.sf.saxon.expr.FilterExpression.optimize(FilterExpression.java:449)

at com.saxonica.ee.optim.OptimizerEE.reorderPredicates(OptimizerEE.java:560)

at net.sf.saxon.expr.FilterExpression.optimize(FilterExpression.java:449)

at com.saxonica.ee.optim.OptimizerEE.reorderPredicates(OptimizerEE.java:560)

Suggesting that we are changing the order of predicates in a filter expression, re-optimizing the result, which possibly reverts back to the original order and therefore recurses indefinitely.

Actions #3

Updated by Preethi Devaraj over 4 years ago

Hi Mike,

I have responded over the email with the latest XSLT and waiting for your reply for long time. Could you please help with the solution.

Thanks, Preethi Devaraj

Actions #4

Updated by Michael Kay over 4 years ago

It's becoming a bit of a tangled story....

The failure doesn't occur in my working copy of Saxon 9.9.1.5 (with patches applied preparing for the next maintenantce release.

Looking at the stack trace and the current code in comparison, the reason it doesn't fail is that the code at at com.saxonica.ee.optim.OptimizerEE.reorderPredicates(OptimizerEE.java:560) has been commented out. It looks as if it was commented out during the investigation of preicate reordering in issue

Actions #5

Updated by Preethi Devaraj over 4 years ago

Mike,

What could me the interim solution? And when can we expect the next release?.

Also as I mentioned previously, we have migrated from the old net.sourceforge.saxon9PE to the latest Saxon9EE. As all the xslt worked fine with the old jar, do you recommend any suitable versions (compatible with Java8) other than the latest net.sf.saxon Saxon9EE. Such that the xslt would not require any changes as we consume it from a different system.

Interim solution is to switch off optimization using the -opt flag. Maintenance releases for a mature product like 9.9 are generally about every two months, but it depends on the urgency. - Michael Kay**

Actions #6

Updated by Michael Kay over 4 years ago

  • Category set to Internals
  • Status changed from AwaitingInfo to In Progress
  • Priority changed from Low to Normal

I have now reproduced the problem, but it involved reverting my code to an earlier state, as it was at the time of the 9.9.1.5 maintenance release. Normally this would mean that the bug has been fixed and no further action is needed. However, the change to the code was made for an unrelated reason, and I am not convinced it really solves this problem except, by chance, for this particular example.

The expression in question is on line 637:

(//InsuredOrPrincipal 
    [not(exists(InsuredOrPrincipalInfo/InsuredOrPrincipalRoleCd))]
  /GeneralPartyInfo/NameInfo/CommlName [CommercialName!=''] / CommercialName)[1]"

and it's complicated by the fact that it has already been extensively rewritten before the failing optimization is attempted. We're working with the expression:

(descendant::element(Q{}CommlName)
   [parent::element(Q{}NameInfo)
      [exists(parent::element(Q{}GeneralPartyInfo)
         [exists(parent::element(Q{}InsuredOrPrincipal)
            [empty(child::element(Q{}InsuredOrPrincipalInfo)/child::element(Q{}InsuredOrPrincipalRoleCd))
      ] ) ] ) ] ] )
  [(data(child::element(Q{}CommercialName))) != ""]

This rewrite is to take advantage of quick access to descendant CommlName elements from the document root; and it's now trying to decide whether to evaluate the predicate that looks at children first (the test on CommercialName) or whether to look first at the parent element.

Actions #7

Updated by Michael Kay over 4 years ago

It turns out that the reason the reordering of predicates doesn't converge is integer overflow in the cost calculations. We estimate the cost of evaluating both predicates, say C1 and C2, and if C1 > 2C2 then we reorder them. But if 2C2 overflows and delivers a negative integer, then (C1 > 2C2) and (C2 > 2C1) will both return true. We could do the calculation using double or long arithmetic, or we could cap the cost estimate at some maximum (it's a very crude calculation anyway).

However, we should consider implementing some of the improvements made to the development branch following the investigation described in bug #4340 (see comment 10 et seq). These improvements consider all the candidate predicates collectively, rather than comparing them pairwise, and they also take into account the indexability of each predicate, which is a more important factor than the raw evaluation cost.

Actions #8

Updated by Michael Kay over 4 years ago

  • Status changed from In Progress to Resolved
  • Applies to branch 9.8, 9.9, trunk added
  • Fix Committed on Branch 9.9, trunk added

I have fixed the problem by changing the expression cost calculations to use double rather than int arithmetic. (I have also capped the cost, so for complex expressions the cost is cheaper to calculate; it's enough to know that it's high without trying to get an exact figure).

Actions #9

Updated by Preethi Devaraj over 4 years ago

Hi Mike,

Should we expect any patches? or could you suggest how to use -opt flag in a Java Web application, where we refer the saxon jar as a library.

Thanks, Preethi Devaraj

Actions #10

Updated by Michael Kay over 4 years ago

We mark a bug as resolved when we have committed a patch (that's not particularly useful for EE users who don't have access to source code, however), and then we mark it closed when we ship a maintenance release containing that patch, which is usually within 6 weeks or so.

You can set optimizer options programmatically using Configuration.setConfigurationProperty() or Processor.setConfigurationProperty() with Feature.OPTIMIZATION_LEVEL

Actions #11

Updated by Vladimir Nesterovsky over 4 years ago

I have fixed the problem by changing the expression cost calculations to use double rather than int arithmetic.

Another approach to fix such overflow would be to switch to logarithmic scale, so instead of cost C it's possible to use c = log2(C). This way:

C2 = C1 * 2 would be expressed with c2 = c1 + 1 C3 = C1 * C2 would be expressed with c3 = c1 + c2 C3 = C1 + C2 would be approximated with c3 = max(c1, c2) + 1 C(n+1) = C1 + C2 + … + Cn would be approximated with c(n+1) = max(c1, c2, …, cn) + log2(N), where log2(N) is approximated with Java.lang.Integer.highestOneBit().

Actions #12

Updated by Michael Kay over 4 years ago

Isn't a logarithmic scale exactly what double floating point gives us? But in fact there's no need. The calculations are very crude (we really don't know what a//b will cost without any volumetrics on the instance data), we're just trying to distinguish simple expressions from complex ones.

Actions #13

Updated by Vladimir Nesterovsky over 4 years ago

I agree, it is the same. the only the difference is that you can stay with integer arithmetic and use + instead *.

Actions #14

Updated by O'Neil Delpratt over 4 years ago

  • Status changed from Resolved to Closed
  • Fixed in Maintenance Release 9.9.1.6 added

Patch committed to the Saxon 9.9.1.6 maintenance release.

Please register to edit this issue

Also available in: Atom PDF