Project

Profile

Help

Bug #2577

closed

Performance regression in 9.6 caused by complex union patterns

Added by Thorsten Rohm over 8 years ago. Updated almost 8 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
-
Sprint/Milestone:
Start date:
2016-01-08
Due date:
% Done:

100%

Estimated time:
Legacy ID:
Applies to branch:
9.6
Fix Committed on Branch:
9.6
Fixed in Maintenance Release:
Platforms:

Description

When I try to process the attached XML the excecution time raises

  • from 2,234ms (using Saxon PE 9.2.1.1)

  • to 5,774,967ms (using Saxon EE 9.6.0.4).


Files

Saxon_Performance.zip (1.13 MB) Saxon_Performance.zip Thorsten Rohm, 2016-01-08 15:01
Actions #1

Updated by Michael Kay over 8 years ago

Thanks for reporting it, will investigate.

Actions #2

Updated by Michael Kay over 8 years ago

The script appears to depend on catalog files which are not present, and which are used to resolve DTD references that are also not present.

Would it be possible to supply the test case in a rather simpler form without such dependencies? Also, is it really necessary to run all steps of the script to demonstrate the problem - can't it be isolated to one of the steps?

Actions #3

Updated by Michael Kay over 8 years ago

I found that I could get the transformation to run (without a catalog resolver) simply by commenting out the DOCTYPE reference. (It's always possible, of course, that this is relevant to the problem, but I thought I would try and make progress this way).

I haven't yet re-installed 9.2 to do the comparison there, but I found that the transformation time for both the DocAF and DocAM documents was about 4 seconds under Saxon 9.7, and 25 - 35 seconds under Saxon 9.6 (specifically 9.6.0.8).

I do have 9.1 installed, and that gives me a transformation time of around 4 seconds on the DocAF document. So this does suggest 9.6 has a problem here.

I haven't yet run the much larger AllChars document on 9.6, but it's clearly taking a long time. On 9.7 it takes 213708.265ms, which is a lot better than the figure you report for 9.6, but a lot worse than your 9.2 figure. My first guess would be that there is something about the stylesheet that is causing it to take quadratic time under 9.6, so the larger the source document, the more marked the performance difference. This would suggest that the difference is due to some change in optimization strategy. Whatever it is, it seems to be something that 9.7 has fixed. Unfortunately it's quite a large stylesheet so it may take a little while to pin down where the trouble is.

Actions #4

Updated by Michael Kay over 8 years ago

I've now run profiling (with 9.6 against the DokAF document) using -TP:profile.html and this shows that the transformation time is completely dominated by one template:

"*modules/t0.xsl" 7048 template td 1841 18.520 34095.091 17.677 32544.178

This template is obviously quadratic in the number of td siblings, but it's not obvious why that should cause a problem of this magnitude.

I have run with Java profiling enabled, and the results are strangely uninformative - almost no samples actually occurred in Saxon code. Perhaps I have an instrumentation problem, or perhaps this is actually a symptom of what's wrong.

Actions #5

Updated by Michael Kay over 8 years ago

OK, I was looking at the wrong Java trace. I've now got the right one. It shows what one might expect from the source of the "td" template - a lot of time spent navigating various axes. The main question is what optimization 9.1 (and 9.7) are finding that 9.6 fails to find. I think my next step will be to look at the optimization trace and execution plans. This can be tricky with a large stylesheet - I think there's a way to do it on a per-template level, at least in recent releases.

Actions #6

Updated by Michael Kay over 8 years ago

Attempted a bit of what I call "subtractive measurement" - take functionality out, and see what impact it has.

Removing the

<xsl:variable name="td-colspan" select="sum(preceding-sibling::td/@colspan) + 1"/>

at line 7050 had no impact at all: the number of td siblings is too small for the quadratic cost here to make a difference.

However, removing the apply-templates at line 7148 had a dramatic difference, reducing the cost from 45 seconds to 2.5 seconds. (This is all under 9.6.0.8).

The -TP profiling includes the cost of pattern matching by the apply-templates instruction as part of the cost of the template containing the apply-templates, so my current theory is that the time is spent in matching candidate patterns for the children of a td element. The challenge now is to find which patterns are being considered here. At first sight, the only elements appearing as children of td are td/a elements, and these match the template rule on line 7152. But perhaps there are other rules that have to be assessed before the rule on line 7152 is chosen?

Indeed there are: notably the complex union pattern on line 4900 of t0.xsl.

Actions #7

Updated by Michael Kay over 8 years ago

Looking at this in the debugger, it seems that the union pattern has been broken up into multiple rules as it should be, but the actual pattern stored with each rule is the original union pattern rather than the specific term. (And I have a vague recollection of fixing this during 9.7 development). So if there are 10 branches in the union pattern, we are actually doing 100 pattern tests.

Actions #8

Updated by Michael Kay over 8 years ago

Note: for the history, the following might be relevant, although it's not quite the same problem:

https://saxonica.plan.io/issues/2145

Actions #9

Updated by Michael Kay over 8 years ago

Yes, I've confirmed that there's a difference between the 9.6 and 9.7 paths here. On the 9.6 path, what's happening is:

  1. a Rule is created for each branch of the union pattern: this is essentially a (pattern, action) pair.

  2. after the pattern is optimized, all the associated rules are updated with the optimized version of the pattern. This puts the full union pattern into each of them, which is clearly wrong.

On the 9.7 branch, the code (in TemplateRule.setMatchPattern) that does this has simply been commented out. In its place, the code in XSLTemplate (line 825) that optimizes the pattern now iterates over the Rule objects, optimizing each of the sub-patterns in turn.

In 9.5 and previous releases, the problem simply did not arise because patterns were not optimized. The optimization step was introduced because for some predicates it can be very powerful, e.g. replacing match=para[last()] by match=para[not(following-sibling::para)] can make a dramatic difference.

So essentially, we need to try and retrofit the 9.7 approach onto 9.6.

Actions #10

Updated by Michael Kay over 8 years ago

  • Status changed from New to Resolved
  • Applies to branch 9.6 added
  • Fix Committed on Branch 9.6 added

I've made this change to 9.6 and it appears to solve the problem. Marking this resolved, though it will still need regression testing before we issue it in a maintenance release.

Actions #11

Updated by Michael Kay over 8 years ago

  • Subject changed from Performance comparison: Saxon PE 9.2.1.1 vs. EE 9.6.0.4 to Performance regression in 9.6 caused by complex union patterns
Actions #12

Updated by O'Neil Delpratt almost 8 years ago

  • Status changed from Resolved to Closed
  • % Done changed from 0 to 100
  • Fixed in Maintenance Release 9.6.0.9 added

Bug fix applied in the Saxon 9.6.0.9 maintenance release.

Actions #13

Updated by O'Neil Delpratt almost 8 years ago

  • Sprint/Milestone set to 9.6.0.9

Please register to edit this issue

Also available in: Atom PDF