Project

Profile

Help

Bug #4340

closed

Performance degraded after moving from Saxon 8.9 to 9.8.0.15 (use case Enterprise constraint)

Added by Stefan Fritz over 4 years ago. Updated about 4 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
Performance
Sprint/Milestone:
-
Start date:
2019-10-16
Due date:
% Done:

100%

Estimated time:
Legacy ID:
Applies to branch:
9.8
Fix Committed on Branch:
10
Fixed in Maintenance Release:
Platforms:

Description

Customer upgraded to the latest version of our product which uses Saxon 9.8.0.15 EE. They came from an older version which used Saxon 8.9.0.14. They process thousands of documents per day with hundreds of XSLs. For two use cases (see other Saxon issue reported for the second use case) they see dramatic performance decrease which impacts their business.

Below you will find test results from our product(Sonic), from Saxon commandline and also from a Java test utility that simulates how our product uses Saxon. The ZIP contains sonicTest_EntConstraint.cmd and saxonTest_EntConstraint.cmd which allow you to reproduce the tests that I did. Further it contains the source code of the SonicXSLCmdLine.

You may notice that some of the results got even worse with 9.9. We have to stick to 9.8 for now as we still have Java 7 support. Therefore we would appreciate a patch release for 9.8 once the cause has been identified.

I sent you a download link for the ZIP with the reproducible to the support email address. The ZIP contains sensible customer data and therefore password protected. (please do not attach here!)

"Enterprise Constraint" use case summary:

With Saxon 8.9 as the baseline the processing time is 4x inside Sonic with Saxon 9.8.0.15. Saxon commandline processing time is 3x. SaxonXSLCmdLine is showing similar results to Sonic product.

RESULTS:

Sonic product Saxon 8.9 ca. 30s Saxon 9.8 ca. 120s Saxon 9.8 n/a (test stopped after 1 hour with optimization disabled)

SaxonXSLCmdLine RESULTS:  XSL: EntConstraint\enterpriseConstraint_xslt\Validation_Engine.xsl  JVM arguments: -Xss4m -Dxsl.useTracing=false  Saxon 9.8.0.15:    108120 ms  Saxon 9.9.1.2:    126700 ms

SaxonXSLCmdLine RESULTS:  XSL: EntConstraint\enterpriseConstraint_xslt\Validation_Engine.xsl  JVM arguments: -Xss4m -Dxsl.useTracing=false -Dsaxon.optimizationLevel=-c  Saxon 9.8.0.15:    234940 ms  Saxon 9.9.1.2:    230420 ms

Saxon command line:  XSL: EntConstraint\enterpriseConstraint_xslt\Validation_Engine.xsl  Saxon 8.9.0.4:        28360 ms  Saxon 9.1.0.9:     3782070 ms  Saxon 9.8.0.15:      94810 ms  Saxon 9.9.1.2:      128580 ms

Actions #1

Updated by Michael Kay over 4 years ago

Successfully run from the command line: Saxon-EE 9.9.1.5 with default options:

Stylesheet compilation time: 960.618461ms
Processing file:/Users/mike/bugs/2019/4340-fritz/EntConstraint/CXML_SMALL_Request_for_EnterpriseConstraints.xml
Using parser com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser
Building tree for file:/Users/mike/bugs/2019/4340-fritz/EntConstraint/CXML_SMALL_Request_for_EnterpriseConstraints.xml using class net.sf.saxon.tree.tiny.TinyBuilder
Tree built in 397.97668ms
Tree size: 132741 nodes, 297569 characters, 190409 attributes
URIResolver.resolve href="Constraint_Config.xml" base="file:/Users/mike/bugs/2019/4340-fritz/EntConstraint/enterpriseConstraint_xslt/Validation_Engine.xsl"
Building tree for file:/Users/mike/bugs/2019/4340-fritz/EntConstraint/enterpriseConstraint_xslt/Constraint_Config.xml using class net.sf.saxon.tree.tiny.TinyBuilder
Tree built in 5.199861ms
Tree size: 869 nodes, 0 characters, 3197 attributes
URIResolver.resolve href="InTrans_DataAccess.xsl" base="file:/Users/mike/bugs/2019/4340-fritz/EntConstraint/enterpriseConstraint_xslt/Validation_Engine.xsl"
Building tree for file:/Users/mike/bugs/2019/4340-fritz/EntConstraint/enterpriseConstraint_xslt/InTrans_DataAccess.xsl using class net.sf.saxon.tree.tiny.TinyBuilder
Tree built in 3.001158ms
Tree size: 1362 nodes, 260 characters, 1157 attributes
Execution time: 1m 46.535708977s (106535.708977ms)
Memory used: 1,068,586,456

Saxon-SA 8.9.0.4 with default options:

Stylesheet compilation time: 414 milliseconds
Processing file:/Users/mike/bugs/2019/4340-fritz/EntConstraint/CXML_SMALL_Request_for_EnterpriseConstraints.xml
Building tree for file:/Users/mike/bugs/2019/4340-fritz/EntConstraint/CXML_SMALL_Request_for_EnterpriseConstraints.xml using class net.sf.saxon.tinytree.TinyBuilder
Tree built in 363 milliseconds
Tree size: 172174 nodes, 297569 characters, 190409 attributes
Building tree for file:/Users/mike/bugs/2019/4340-fritz/EntConstraint/enterpriseConstraint_xslt/Constraint_Config.xml using class net.sf.saxon.tinytree.TinyBuilder
Tree built in 3 milliseconds
Tree size: 869 nodes, 0 characters, 3197 attributes
Building tree for file:/Users/mike/bugs/2019/4340-fritz/EntConstraint/enterpriseConstraint_xslt/InTrans_DataAccess.xsl using class net.sf.saxon.tinytree.TinyBuilder
Tree built in 2 milliseconds
Tree size: 1374 nodes, 260 characters, 1157 attributes
Execution time: 23533 milliseconds
Memory used: 125796784
NamePool contents: 592 entries in 399 chains. 13 prefixes, 14 URIs

Actions #2

Updated by Michael Kay over 4 years ago

Profiling with hprof:cpu=samples

8.9.0.4 top ten

CPU SAMPLES BEGIN (total = 2155) Fri Oct 18 22:37:57 2019
rank   self  accum   count trace method
   1 10.35% 10.35%     223 300465 net.sf.saxon.pattern.LocalNameTest.matches
   2 10.21% 20.56%     220 300468 net.sf.saxon.pattern.LocalNameTest.matches
   3  5.01% 25.57%     108 300472 net.sf.saxon.tinytree.DescendantEnumeration.next
   4  3.06% 28.63%      66 300466 net.sf.saxon.tinytree.DescendantEnumeration.next
   5  2.78% 31.42%      60 300492 net.sf.saxon.value.StringValue.codepointEquals
   6  2.65% 34.06%      57 300425 net.sf.saxon.pattern.LocalNameTest.matches
   7  2.23% 36.29%      48 300445 net.sf.saxon.tinytree.AttributeEnumeration.moveNext
   8  2.09% 38.38%      45 300447 net.sf.saxon.tinytree.SiblingEnumeration.<init>
   9  2.00% 40.37%      43 300467 net.sf.saxon.expr.GeneralComparison.effectiveBooleanValue
  10  1.95% 42.32%      42 300572 net.sf.saxon.expr.GeneralComparison.compare

For 9.9.1.5 most of the time is spent in generated bytecode:

   1 48.74% 48.74%    4843 300877 EE_sysFuncCall_4711758750889.effectiveBooleanValue
   2 23.67% 72.41%    2352 300870 EE_sysFuncCall_4751067208990.effectiveBooleanValue
   3 12.79% 85.21%    1271 300880 EE_sysFuncCall_4711758750889.effectiveBooleanValue
   4  4.61% 89.81%     458 300873 EE_sysFuncCall_4751067208990.effectiveBooleanValue
   5  1.12% 90.93%     111 300882 net.sf.saxon.tree.iter.ListIterator.next
   6  0.97% 91.90%      96 300879 net.sf.saxon.tree.iter.ListIterator.next
   7  0.70% 92.60%      70 300796 net.sf.saxon.tree.tiny.AttributeIterator.moveToNext
   8  0.50% 93.11%      50 300891 net.sf.saxon.tree.tiny.SiblingIterator.next
   9  0.40% 93.51%      40 300875 net.sf.saxon.value.MemoClosure.iterate
  10  0.25% 93.76%      25 300887 net.sf.saxon.value.MemoClosure.iterate

Switching bytecode generation off increases the 9.9.1.5 execution time to 3m 31.13s so bytecode generation is working well; but the very high proportion of time spent in this one place means it must surely be the source of the problem.

Actions #3

Updated by Michael Kay over 4 years ago

-TL profiling fails because the stack depth is too deep: raised this as bug #4350.

Tried it in IntelliJ with that bug fixed, but it takes for ever so I killed it.

Meanwhile took a look at the JProfiler CPU sampling, which shows a 50% hotspot in tree.tiny,DescendantIteratorSansText.next(). Generated bytecode appears in the profile, but not at the significant level seen in the runhprof output. All the effort here is navigating the tinytree.

We can see that the iterator is being used with a LocalNameTest (.//*:local) which is bad news because it involves string comparisons (can't use integer fingerprints if the namespace is wildcarded). Worse than that, we have to go to the NamePool on every element to translate the fingerprint to a local name.

The iterator is being walked in the course of evaluating a GeneralComparison. The GeneralComparison itself has been compiled to bytecode, but we never generate bytecode for the (tree-model-specific) axis navigation. The GeneralComparison, as one might expect, is in a filter predicate; and this involves a sort into document order. But the JProfiler output doesn't tell us where we are in the stylesheet code, and again, with -TP enabled it's we can wait all day for it to finish.

Running with -T fares better, and points us to this line as the source of the problem:

<xsl:for-each select="//*:Ratings/*:Rating[@meta:OnCurrentPolicyTerm][@*:ID = //*:Relationships/*:Relationship/*:Participants[*:Participant/*:Coverage/@*:ID = $thisCoverage/@*:ID]/*:Participant/*:Rating/@*:ID]">

My suspicion is now that Saxon 8.9 managed to optimize this join expression to use an index, and later releases failed to do so. Unfortunately 8.9 didn't have the -explain option to reveal the optimization decisions made, so it's difficult to verify this.

The good news is that it can very easily be hand-optimized

Actions #4

Updated by Michael Kay over 4 years ago

I attempted to hand-optimize this by adding two keys

	<xsl:key name="Ratings" match="//*:Ratings/*:Rating[@meta:OnCurrentPolicyTerm]" use="@*:ID"/>
	<xsl:key name="Participants" match="//*:Relationships/*:Relationship/*:Participants" use="*:Participant/*:Coverage/@*:ID"/>

and replacing the for-each/@select with

<xsl:for-each select="key('Ratings', key('Participants', $thisCoverage/@*:ID)/*:Participant/*:Rating/@*:ID)">

and there was almost no effect on the bottom line: execution time (under 10.0) 1m 42

So I looked at the -explain output (which I should have done earlier) and Saxon is indeed optimizing this construct (it's lifting a large expression out of the loop, and doing a lookup using indexes).

This raises the possibility that 8.9 and 9.9 are both optimizing the construct, but doing it differently. If that turns out to be the case, then it could be a difficult one to fix, because there is no one strategy for optimization that will always give the best answer: we don't have any information on volumetrics, so the strategy involves a lot of guesswork, and a particular strategy might be good for one workload and bad for another.

Actions #5

Updated by Michael Kay over 4 years ago

In comment #3 I came to the conclusion that I had pinpointed the source of the problem on rather flimsy evidence (eyeballing the voluminous -T output to look for things that cropped up frequently). There are plenty of other candidates and I think we need more systematic analysis to see where the costs are going. I don't know why -TP is failing but it's a high-overhead approach to instrumentation so it's not that surprising. I might have to attempt some custom instrumentation by modifying product internals.

Actions #6

Updated by Michael Kay over 4 years ago

Adding some internal instrumentation to count execution of named templates and template rules gives the following rather surprising results:

COUNTERS
file:/Users/mike/bugs/2019/4340-fritz/EntConstraint/enterpriseConstraint_xslt/InTrans_DataAccess.xsl#448 = 1
file:/Users/mike/bugs/2019/4340-fritz/EntConstraint/enterpriseConstraint_xslt/InTrans_DataAccess.xsl#649 = 1
file:/Users/mike/bugs/2019/4340-fritz/EntConstraint/enterpriseConstraint_xslt/InTrans_DataAccess.xsl#904 = 1
file:/Users/mike/bugs/2019/4340-fritz/EntConstraint/enterpriseConstraint_xslt/InTrans_DataAccess.xsl#122 = 1
file:/Users/mike/bugs/2019/4340-fritz/EntConstraint/enterpriseConstraint_xslt/InTrans_DataAccess.xsl#222 = 1
file:/Users/mike/bugs/2019/4340-fritz/EntConstraint/enterpriseConstraint_xslt/InTrans_DataAccess.xsl#1150 = 1
file:/Users/mike/bugs/2019/4340-fritz/EntConstraint/enterpriseConstraint_xslt/InTrans_DataAccess.xsl#961 = 1
file:/Users/mike/bugs/2019/4340-fritz/EntConstraint/enterpriseConstraint_xslt/InTrans_DataAccess.xsl#103 = 1
file:/Users/mike/bugs/2019/4340-fritz/EntConstraint/enterpriseConstraint_xslt/InTrans_DataAccess.xsl#202 = 1
file:/Users/mike/bugs/2019/4340-fritz/EntConstraint/enterpriseConstraint_xslt/InTrans_DataAccess.xsl#501 = 1
file:/Users/mike/bugs/2019/4340-fritz/EntConstraint/enterpriseConstraint_xslt/InTrans_DataAccess.xsl#182 = 1
file:/Users/mike/bugs/2019/4340-fritz/EntConstraint/enterpriseConstraint_xslt/InTrans_DataAccess.xsl#1049 = 1
file:/Users/mike/bugs/2019/4340-fritz/EntConstraint/enterpriseConstraint_xslt/InTrans_DataAccess.xsl#161 = 1
file:/Users/mike/bugs/2019/4340-fritz/EntConstraint/enterpriseConstraint_xslt/InTrans_DataAccess.xsl#85 = 1
Q{}Relate = 1
file:/Users/mike/bugs/2019/4340-fritz/EntConstraint/enterpriseConstraint_xslt/InTrans_DataAccess.xsl#1306 = 1
file:/Users/mike/bugs/2019/4340-fritz/EntConstraint/enterpriseConstraint_xslt/InTrans_DataAccess.xsl#2073 = 1
file:/Users/mike/bugs/2019/4340-fritz/EntConstraint/enterpriseConstraint_xslt/InTrans_DataAccess.xsl#538 = 1
file:/Users/mike/bugs/2019/4340-fritz/EntConstraint/enterpriseConstraint_xslt/InTrans_DataAccess.xsl#815 = 1
file:/Users/mike/bugs/2019/4340-fritz/EntConstraint/enterpriseConstraint_xslt/InTrans_DataAccess.xsl#67 = 1
file:/Users/mike/bugs/2019/4340-fritz/EntConstraint/enterpriseConstraint_xslt/InTrans_DataAccess.xsl#1363 = 1
file:/Users/mike/bugs/2019/4340-fritz/EntConstraint/enterpriseConstraint_xslt/InTrans_DataAccess.xsl#377 = 1
file:/Users/mike/bugs/2019/4340-fritz/EntConstraint/enterpriseConstraint_xslt/InTrans_DataAccess.xsl#575 = 1
file:/Users/mike/bugs/2019/4340-fritz/EntConstraint/enterpriseConstraint_xslt/Validation_Engine.xsl#67 = 1
file:/Users/mike/bugs/2019/4340-fritz/EntConstraint/enterpriseConstraint_xslt/InTrans_DataAccess.xsl#698 = 1
file:/Users/mike/bugs/2019/4340-fritz/EntConstraint/enterpriseConstraint_xslt/InTrans_DataAccess.xsl#752 = 1
file:/Users/mike/bugs/2019/4340-fritz/EntConstraint/enterpriseConstraint_xslt/InTrans_DataAccess.xsl#832 = 1
file:/Users/mike/bugs/2019/4340-fritz/EntConstraint/enterpriseConstraint_xslt/InTrans_DataAccess.xsl#1018 = 1
file:/Users/mike/bugs/2019/4340-fritz/EntConstraint/enterpriseConstraint_xslt/InTrans_DataAccess.xsl#1436 = 1
Q{}createKeyAttribute = 30011
file:/Users/mike/bugs/2019/4340-fritz/EntConstraint/enterpriseConstraint_xslt/InTrans_DataAccess.xsl#4 = 1
file:/Users/mike/bugs/2019/4340-fritz/EntConstraint/enterpriseConstraint_xslt/InTrans_DataAccess.xsl#1899 = 1

Adding a counter for tail calls, we get 2582 calls on the interestingly-complex Relate template.

But the striking observation is that these numbers all seem very low, for a stylesheet that takes nearly 2 minutes to execute. I really don't yet have any kind of feel for what it is spending its time doing.

Actions #7

Updated by Michael Kay over 4 years ago

I realised that we could get finer-grained instrumentation from the bytecode monitoring report, -TB. This reveals that there are 5 byte code candidate code fragments with over 1 million executions:

<exp id="523" text="fn:not(...)" intCount="99" intTime="38" bcCount="1002844" bcTime="153306"
  compileTime="656" ratio="0.391" ebv="1002943" eval="0" iterate="0" process="0"
  module="file:/Users/mike/bugs/2019/4340-fritz/EntConstraint/enterpriseConstraint_xslt/InTrans_DataAccess.xsl"
  line="1285"/>

<exp id="525" text="fn:not(...)" intCount="99" intTime="144" bcCount="6676953" bcTime="64859947"
  compileTime="1308" ratio="6.649" ebv="6677052" eval="0" iterate="0" process="0"
  module="file:/Users/mike/bugs/2019/4340-fritz/EntConstraint/enterpriseConstraint_xslt/InTrans_DataAccess.xsl"
  line="1285"/>

<exp id="318" text="fn:exists(...)" intCount="99" intTime="32" bcCount="1828777" bcTime="940293"
  compileTime="1345" ratio="1.553" ebv="1828876" eval="0" iterate="0" process="0"
  module="file:/Users/mike/bugs/2019/4340-fritz/EntConstraint/enterpriseConstraint_xslt/InTrans_DataAccess.xsl"
  line="670"/>

<exp id="320" text="fn:exists(...)" intCount="99" intTime="1995" bcCount="1828777" bcTime="2153676"
  compileTime="2318" ratio="0.058" ebv="1828876" eval="0" iterate="0" process="0"
  module="file:/Users/mike/bugs/2019/4340-fritz/EntConstraint/enterpriseConstraint_xslt/InTrans_DataAccess.xsl"
  line="664"/>

<exp id="524" text="fn:not(...)" intCount="99" intTime="574" bcCount="3342300" bcTime="32854322"
  compileTime="848" ratio="1.695" ebv="3342399" eval="0" iterate="0" process="0"
  module="file:/Users/mike/bugs/2019/4340-fritz/EntConstraint/enterpriseConstraint_xslt/InTrans_DataAccess.xsl"
  line="1285"/>

and these are in two clusters around lines 665-670 and line 1285.

Lines 665 and 670 are complex joins similar to the one we looked at earlier:

<xsl:for-each select="//*:Relationships/*:Relationship/*:Participants[*:Participant[*:Category/(@*:SourceKey|@*:MDMKey) = 'PROTECTS']/*:Coverage[@*:ID = $thisCoverage/@*:ID]]/*:Participant/*:Risk">
<xsl:for-each select="//*:Groups/*:Group[@*:ID = //*:Relationships/*:Relationship/*:Participants[*:Participant/*:Coverage[@*:ID = $thisCoverage/@*:ID]]/*:Participant[@*:GenerateGroupMemberRelationships = '1' or @*:GenerateGroupMemberRelationships = 'true']/*:Group/@*:ID]//*:Member/*:Risk">

Line 1285 is within the recursive Relate template, and involves some complex deduplication logic:

select="remove($RelatedKeys,1),distinct-values($EntityRelationships/*[not(@Key = $RelationshipExclusions/*/@Key)][RelatedTo/*/@Key = $RelatedKeys[1]][not(@Key = $ProcessedKeys)][not(@Key = $RelatedKeys)]/@Key)"/>

Note that none of the variables here have a declared type, so Saxon has very little scope for optimizing these comparisons.

(The report also shows that the benefit of generating bytecode for these code fragments is quite uneven, some showing a very good improvement and some a small worsening. But that's in line with expectations.)

The bytecode profile shows the predicate at line 1285 as dominant, so this is what we should probably explore first, although the Java profiling done earlier points more at the join queries around line 665/670 as the hotspot.

Actions #8

Updated by Michael Kay over 4 years ago

Looking at the code at line 1285, in particular the expression

$EntityRelationships/*[not(@Key = $RelationshipExclusions/*/@Key)][RelatedTo/*/@Key = $RelatedKeys[1]][not(@Key = $ProcessedKeys)][not(@Key = $RelatedKeys)]/@Key

there is clearly some scope for better optimization. I would have expected the subexpression

$EntityRelationships/*[not(@Key = $RelationshipExclusions/*/@Key)]

to be promoted to a global variable, since it has no local dependencies, but this has not happened. Instead the subexpression appears to have been promoted to a local variable only,

I might also have expected the second predicate [RelatedTo/*/@Key = $RelatedKeys[1]] to have been promoted above the first and/or used to construct an index on the global variable, but this has not happened.

One possible line of inquiry now is to try and explore what 8,9 did with this -- or perhaps to look at some intermediate releases that might offer better -explain output than 8.9. In a sense, though, that's only of historic interest: it would probably be a more useful exercise to look at the optimisation of this expression from first principles and see why it isn't being done better. It might be worth experimenting with some hand optimisations first to see if they make a difference.

Actions #9

Updated by Michael Kay over 4 years ago

The effect of pulling out

$EntityRelationships/*[not(@Key = $RelationshipExclusions/*/@Key)]

into a global variable is dramatic: it reduces execution time to 6069ms. I think you should make this change, but meanwhile I will explore why Saxon fails to detect this opportunity. I'm not going to investigate further why 8.9 differs; optimization approaches change a little over time, and the changes will always benefit some workloads at the expense of others. Sometimes where multiple optimization opportunities exist it's very hard to decide which of them will give better results and the effects are essentially random.

Actions #10

Updated by Michael Kay over 4 years ago

We've got an expression of the form $E/*[P1]{P2][P3][P4] where $E is a global variable reference and P1 has no dependencies other than the context item and other global data, and we would like to see $E1/*[P1] promoted to a global variable. The reason this isn't happening is the bracketing. It arrives in the optimizer as

$E / ( ( ( ( child::* [P1] ) [P2] ) [P3] ) [P4] )

We attempt various re-arrangements of this expression, including promoting the filters to the top level, but none of these re-arrangements encounters a subexpression without local dependencies (the other predicates P2, P3, and P4 depend on local variables). In fact the strategy at this point is focused on trying to find something indexable (although it fails in that too) rather than in trying to find something that can be promoted to global. The search for subexpressions that can be promoted to global variables only comes later, after all the rearrangements have been done.

Walking this through the debugger, the process seems very undirected. We're attempting various rearrangements of the predicates in the hope that we find something indexable, rather than examining and analysing the expression to seek out predicates that are worth promoting, based on their properties. We're therefore doing a lot of work in the optimizer (for this example) that ends up achieving very little.

[And standing back from it, there's no bug here; just an opportunity to improve the design.]

Actions #11

Updated by Michael Kay over 4 years ago

By the time we get to the stage of global variable extraction, the expression is

if (exists($RelatedKeys)) 
then (
   let $qq:v3 := data(first($RelatedKeys)) 
   return 
     (let $qq:v2 := data(($RelationshipExclusions/child::element())/attribute::attribute(Q{}Key)) 
     return (
        let $qq:v1 := data($RelatedKeys) 
        return (
           let $qq:v0 := data($ProcessedKeys) 
           return (CallTemplate#Relate(
               ProcessedKeys=insert-before(first($RelatedKeys), 0, $ProcessedKeys), 
               RelatedKeys=Block(tail($RelatedKeys), 
                    distinct-values(data((docOrder((((($EntityRelationships/child::element())
                        [not($qq:v0 = data(@Key) ) ] )
                        [not($qq:v1 = data(@Key) ) ] )
                        [not($qq:v2 = data(@Key) ) ] )
                        [ (data( (child::element(Q{}RelatedTo)/child::element())/attribute::attribute(Q{}Key) ) ) = $qq:v3] ) )
                   /attribute::attribute(Q{}Key) ) ) ) ) ) ) ) ) )

 else ...

It's quite hard to work out the bracketing there (despite my reformatting), but to recognize the opportunity to move the "=" predicate into a global variable we would need (a) to do significant re-arrangement of the predicates, and (b) to first promote $qq:v3 to global and thus recognize that the predicate has no local dependencies.

which would be a candidate for promotion to global if we were able to recognize that $qq:v0 could be promoted to global.

Note also, some large subexpressions here are marked as ineligible for promotion to global because they are labelled as requiring lazy evaluation. This is typically because we have to be careful lifting things out of a conditional expression or a loop into a global variable because it would change the error semantics. (Note: we've fixed optimization bugs in this area over the years, which means the optimizer is now less aggressive but more correct than it was in the past; this could account for performance regressions, though that's only a hypothesis.)

Actions #12

Updated by Michael Kay over 4 years ago

I am now studying the optimization of the Relate template, and in particular the filter expression supplied as argument to the distinct-values() call, by isolating it into a free-standing stylesheet.

First observation: the four predicates are reordered on the basis of a crude cost metric which puts the "not" predicates first. The process of sorting the predicates is very clumsy because it considered all possible pairs of predicates (12 pairs) rather than considering the set of four and doing a sort. In addition, sorting by cost without taking account of selectivity is rather crude. The cost of evaluating the composite predicate can be calculated as C1 + S1*(C2 + S2*(C3 + S3*C4)) where C1...4 are the estimated costs, and S1...4 the estimated selectivities. Also, the cost for the "=" predicate, in particular for the path expression RelatedTo/*/@Key = $RelatedKeys[1] seems to be grossly over-estimated.

Second observation: this re-ordering takes no account of the suitability of the predicates for indexing, or of their potential for promotion to a global variable. (It is done before either of those optimizations).

If I suppress the reordering of predicates, I'm still getting various expressions loop-lifted to local variables, but nothing promoted to a global variable. I'm not sure yet why the global promotion isn't happening, but I think it's made more difficult by the fact that parts of the expression have already been loop-lifted to local variables. Logically, it would seem to make more sense to look for global promotion first.

Actions #13

Updated by Michael Kay over 4 years ago

If we force the predicates to be in the right order, and force promotion to global to happen before loop-lifting, then the first thing that prevents this expression being promited to global is that it is inside a branch of a conditional. This is probably due to concerns that as a global, it might be evaluated even if the condition is false. But globals are generally evaluated lazily, so I'm not sure this is a valid concern.

The second thing that prevents it appears to be a bug: the subexpression data(@Key) suppresses promotion not only for itself, but for all its containing expressions. This is because the AtrributeGetter expression doesn't have the property "no nodes newly created". This one is easily fixed.

So it seems that to make this optimization work, we need four changes:

(a) fix the bug in setting the properties of the AttributeGetter expression (easy)

(b) do extraction of global variables BEFORE loop-lifting. This is an easy code change (to ExpressionTool.optimizeComponent()) but it is difficult to predict its consequences.

(c) allow extraction to global variables of expressions that appear in a branch of a conditional. Again this is a fairly easy coding change (to GlobalExtractor.markIneligibleExpressions()), but the consequences might be far-reaching

(d) change the logic for reordering of predicates so that a predicate without dependencies on local variables comes higher in the pecking order

Note that apart from (a), these changes are far too risky to consider including in a maintenance release of 9.9.

Actions #14

Updated by Stefan Fritz over 4 years ago

The effect of pulling out $EntityRelationships/[not(@Key = $RelationshipExclusions//@Key)] into a global variable is dramatic: it reduces execution time to 6069ms. I think you should make this change,

Would you mind attaching your modified XSL to see how exactly you have done this?

Actions #15

Updated by Michael Kay over 4 years ago

In response to comment #14: in InTrans_DataAccess.xsl line 1285, I changed

select="remove($RelatedKeys,1),distinct-values($EntityRelationships/*[not(@Key = $RelationshipExclusions/*/@Key)][RelatedTo/*/@Key = $RelatedKeys[1]][not(@Key = $ProcessedKeys)][not(@Key = $RelatedKeys)]/@Key)"/>

to

select="remove($RelatedKeys,1),distinct-values($IncludedRelationships[RelatedTo/*/@Key = $RelatedKeys[1]][not(@Key = $ProcessedKeys)][not(@Key = $RelatedKeys)]/@Key)"/>

and introduced the global variable

<xsl:variable name="IncludedRelationships" select="$EntityRelationships/*[not(@Key = $RelationshipExclusions/*/@Key)]"/>
Actions #16

Updated by Michael Kay over 4 years ago

I have made changes (b) and (c) on the development branch and there is no obvious functional or performance regression in the basic XSLT3 and QT3 test suites.

Actions #17

Updated by Michael Kay over 4 years ago

  • Category set to Performance
  • Status changed from New to Resolved
  • Assignee set to Michael Kay
  • Priority changed from High to Normal
  • Fix Committed on Branch trunk added

I have now implemented the optimizer changes on the development branch to automate the promotion of this expression. I've checked that the code produced is correct; unfortunately I'm not able to get a performance measure right now because there are other changes in progress on the development branch that prevent this stylesheet successfully executing right now. I'll come back to it later.

Actions #18

Updated by O'Neil Delpratt about 4 years ago

  • Status changed from Resolved to Closed
  • % Done changed from 0 to 100
  • Fix Committed on Branch 10 added
  • Fix Committed on Branch deleted (trunk)
  • Fixed in Maintenance Release 10.0 added

Bug fix applied in the Saxon 10.0 major release.

Please register to edit this issue

Also available in: Atom PDF