Project

Profile

Help

Bug #3843

closed

Compiling an XPath expression takes 23 minutes

Added by Michael Kay almost 6 years ago. Updated over 5 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
Performance
Sprint/Milestone:
-
Start date:
2018-07-16
Due date:
% Done:

100%

Estimated time:
Legacy ID:
Applies to branch:
9.8, trunk
Fix Committed on Branch:
9.8, trunk
Fixed in Maintenance Release:
Platforms:

Description

David Thielen (in private email) has reported a chronic performance problem where compiling an XPath expression takes 23 minutes. The time seems to be spent in an optimization step.

Actions #1

Updated by Michael Kay almost 6 years ago

Actually the script is compiling and running a set of 43,000 path expressions (a 5% sample of the original).

I found by accident that running these under XPath, the execution time is around 50s, but under XQuery, it is around 5s.

The difference between the two cases appears to be that when many XPath expressions are processed using the same XPathCompiler, the same KeyManager is used each time, whereas under XQuery, a new KeyManager is used each time.

In principle using the same KeyManager ought to be better, because it means that when two searches search on the same thing, the index that is constructed to support the first search can be reused for the second. The problem appears to be that discovering that the index is reusable is taking longer than rebuilding it from scratch. The time is spent comparing expressions for equality, which involves a deep walk of the expression tree.

Actions #2

Updated by Michael Kay almost 6 years ago

The main reason that the cost of searching for an existing index exceeds the benefit is that the search usually fails (11540001 out of 11540877 searches are unsuccessful).

The other reason is that comparing expressions is expensive even when it fails.

Finally, it's not at all clear why we are doing quite so many searches.

Actions #3

Updated by Michael Kay almost 6 years ago

Two expressions that are actually equal are comparing not equal: the culprit is SubscriptExpression which uses "==" rather than "equals()" on one of the operands.

The effect is that rather than using an existing index, a new one is created. When this is done 1000 times, we have a list of 1000 indexes to search before creating the next one, so the effect is quadratic.

Fixing the equality test on SubscriptExpression brings the XPath execution time down to 9.5s, which is still slower than the XQuery time, but much more acceptable. We are still getting a lot of Expression.equals() activity, but nothing like so much. We now have 74K pattern comparisons of which 66K are unsuccessful, and the number of indexes built is just 28.

It looks as if the search for an index using a particular pattern would benefit from a hash lookup of some kind.

Actions #4

Updated by Michael Kay almost 6 years ago

I have fixed the XPath performance issue (it now runs a little faster than XQuery) by the following two changes:

(a) correct the logic for SubscriptExpression.equals()

(b) introduce a lazily-evaluated hashcode on the Expression object and use this when comparing expressions for equality.

The hotspot (30% of execution time) is now AxisExpression.typeCheck(). I improved this slightly (c) by avoiding some of the checks if the contextItemType is a NodeTest. But I can gain pretty well 30% in bottom line execution time by (d) disabling the "checkPlausibility()" method, whose effect is almost entirely to check for "void" path expressions like child::attribute(). This sounds like a useful configuration option.

There are no other obvious hot-spots. The profiling identifies a few opportunities for getting a percentage point off here and there, for example by:

(e) In UnaryExpression and BinaryExpression, avoiding the general iteration over operands() when typechecking, and just typechecking the fixed child expressions explicitly

(f) avoiding scanning the expression tree to look for updating expressions if updating wasn't enabled in the query parser

These changes aren't appropriate in a maintenance release but I'll consider them for the next major release.

Actions #5

Updated by Michael Kay almost 6 years ago

  • Status changed from New to In Progress

I have committed all the above changes (a-f) on the development branch. I now need to get the 9.8 branch into shape.

Actions #6

Updated by Michael Kay almost 6 years ago

  • Status changed from In Progress to Resolved
  • Applies to branch 9.8, trunk added
  • Fix Committed on Branch 9.8, trunk added

On the 9.8 branch, applied changes (a) and (b) above.

Actions #7

Updated by Debbie Lockett over 5 years ago

  • Status changed from Resolved to Closed
  • % Done changed from 0 to 100
  • Fixed in Maintenance Release 9.8.0.14 added

Bug fix applied in the Saxon 9.8.0.14 maintenance release.

Please register to edit this issue

Also available in: Atom PDF