Performance degraded after moving from Saxon 8.9 to 220.127.116.11 (use case CXML)
Customer upgraded to the latest version of our product which uses Saxon 18.104.22.168 EE. They came from an older version which used Saxon 22.214.171.124. They process thousands of documents per day with hundreds of XSLs. For two use cases (see other Saxon issue reported for the second use case) they see dramatic performance decrease which impacts their business.
Below you will find test results from our product(Sonic), from Saxon commandline and also from a Java test utility that simulates how our product uses Saxon. The ZIP contains sonicTest_CXML.cmd and saxonTest_CXML.cmd which allow you to reproduce the tests that I did. Further it contains the source code of the SonicXSLCmdLine.
We have to stick to 9.8 for now as we still have Java 7 support. Therefore we would appreciate a patch release for 9.8 once the cause has been identified.
I sent you a download link for the ZIP with the reproducible to the support email address. The ZIP contains sensible customer data and therefore password protected. (please do not attach here!)
"CXML" use case summary:
With Saxon 8.9 as the baseline the processing time is 2x inside Sonic with Saxon 126.96.36.199. Saxon commandline processing time is 3x. SaxonXSLCmdLine is showing similar results to Sonic product.
Inside Sonic product: 17min = 1020 sec with Saxon 8.9 41min = 2460 sec with Saxon 9.8 34min = 2040 sec with Saxon 9.9
CXML XSL results form Saxon commandline: Saxon 188.8.131.52: 763290 ms Saxon 184.108.40.206: 813310 ms Saxon 220.127.116.11: 2120170 ms Saxon 18.104.22.168: 2021810 ms
SaxonXSLCmdLine RESULTS: XSL: EDW_Adapters\XSLs\CXMLTOEDW.xsl JVM arguments: -Dxsl.useTracing=false -Dsaxon.optimizationLevel=-c Saxon 22.214.171.124: 2112930 ms Saxon 126.96.36.199: 2016450 ms
SaxonXSLCmdLine RESULTS: XSL: EDW_Adapters\XSLs\CXMLTOEDW.xsl JVM arguments: -Dxsl.useTracing=false Saxon 188.8.131.52: 2156000 ms Saxon 184.108.40.206: 1959450 ms
#2 Updated by Michael Kay 9 months ago
I have completed a profiling run (with -TP:profile.html) using the large source file. This took 37m. The template with match="COLUMN" in CXMLTOEDW.xsl (line 232) was completely dominant. This template makes a number of calls on saxon:evaluate(), which are likely to be expensive. However, JProfiler monitoring (only run for a short time) did not suggest a lot of activity in the XPath parser, instead the focus was on evaluation of *:localname tests.
An obvious thing to try would be to replace the calls on saxon:evaluate() with use of xsl:evaluate, which caches compiled XPath expressions.
It's certainly plausible that XPath parsing might be slower than in earlier releases, as the XPath grammar has grown and as we attempt more optimization.
#3 Updated by Michael Kay 9 months ago
Tracing the actual expressions executed using saxon:evaluate, we see a pattern like this (7 is the current document number), indicating that these are mainly fairly simple relative path expressions making heavy use of local name tests, which may account for the high occurrence of local name tests in the JProfiler output:
7 - *:CoverageRelationships/*:Groups/*:Group[*:Category/@*:ReferenceListCode = 'Employee Group'][*:Category/@*:SourceKey= 'RATEABLE' or *:Category/@*:MDMKey= 'RATEABLE']/*:Domestic/*:Count 7 - *:ClaimsMade/*:ExtendedReportingPeriod/*:Supplemental/*:Reason 7 - *:Details/*:PIP/*:ExcessPIPMedicalExpense/@*:Indicator 7 - *:Details/*:PIP/*:ExcessPIPWorkLoss/@*:Indicator 7 - *:Details/*:KidnapAndRansomExcludedCountries 7 - *:Details/*:PIP/*:ExcludeLossOfIncomeBenefits/@*:Indicator 7 - *:CoverageRelationships/*:Groups/*:Group[*:Category/@*:ReferenceListCode = 'Location Group'][*:Category/@*:SourceKey= 'TOTAL' or *:Category/@*:MDMKey= 'TOTAL']/*:Foreign/*:Count 7 - *:CoverageRelationships/*:Groups/*:Group[*:Category/@*:ReferenceListCode = 'Employee Group'][*:Category/@*:SourceKey= 'RATEABLE' or *:Category/@*:MDMKey= 'RATEABLE']/*:Foreign/*:Count 7 - *:Details/*:HartfordSteamBoilerCoverageCode 7 - *:Details/*:ISOCoverageCode 7 - *:Details/*:ISODeductibleCode 7 - *:Details/*:ISOLiabilityLimitCode 7 - *:ClaimsMade/*:OriginalEntryDate 7 - *:PremiumLevel 7 - *:Details/*:ProductPricingPlan 7 - *:QuotaShareParticipation/*:Category 7 - *:Details/*:ISORatingCode 7 - *:ClaimsMade/*:ReinstateAggregateLimit/@*:Indicator 7 - *:Reporting/*:Frequency 7 - *:ClaimsMade/*:ExtendedReportingPeriod/*:Supplemental/*:Duration/*:EndDate 7 - *:ClaimsMade/*:ExtendedReportingPeriod/*:Supplemental/*:Duration/*:Count 7 - *:ClaimsMade/*:ExtendedReportingPeriod/*:Supplemental/*:Duration/*:Unit 7 - /*:Transaction/@*:ID
#4 Updated by Michael Kay 9 months ago
I put in some counters in saxon:evaluate. These show the following kind of pattern:
Calls 1000 exprs 102 nodes 23 Calls 2000 exprs 102 nodes 44 Calls 3000 exprs 102 nodes 65 Calls 4000 exprs 102 nodes 86 Calls 5000 exprs 102 nodes 108 Calls 6000 exprs 102 nodes 129 Calls 7000 exprs 102 nodes 150 Calls 8000 exprs 102 nodes 171 Calls 9000 exprs 102 nodes 193 Calls 10000 exprs 102 nodes 214
meaning that after 10,000 calls on saxon:evaluate(), we've seen 102 distinct XPath expressions, with 214 distinct nodes used as the context node. the same pattern continues: after 200,000 calls we've still only seen 102 distinct XPath expressions (applied to 4257 distinct context nodes). This would certainly suggest that caching compiled expressions is going to be useful. (Though the absence of XPath parsing in the JProfiler output remains puzzling).
#6 Updated by Michael Kay 9 months ago
I've changed the calls on saxon:evaluate to use xsl:evaluate, to take advantage of caching.
In the first phase of processing the calls come thick and fast and we get a high hit rate on the cache (here it's 100%):
calls 340000 exprs 270 cache 100 hits 339723 calls 341000 exprs 270 cache 100 hits 340723 calls 342000 exprs 270 cache 100 hits 341723 calls 343000 exprs 270 cache 100 hits 342723 calls 344000 exprs 270 cache 100 hits 343723 calls 345000 exprs 270 cache 100 hits 344723
But at around this point the pattern changes, and we start to see the number of distinct expressions steadily increasing, and the hit rate dropping to 25% or so:
calls 346000 exprs 270 cache 100 hits 345723 calls 347000 exprs 453 cache 100 hits 346540 calls 348000 exprs 1222 cache 100 hits 346771 calls 349000 exprs 1992 cache 100 hits 347001 calls 350000 exprs 2760 cache 100 hits 347233 calls 351000 exprs 3531 cache 100 hits 347462
At the same time the time interval between messages increases substantially, suggesting that calls on evaluate() are no longer quite as dominant.
Looking more closely, the XPath expressions now being evaluated are of the form:
//*:Coverage[@*:ID='941b5203-07a2-4d62-e053-1428330a829a.1']/*:BulkCoding/@*:Indicator //*:Coverage[@*:ID='941b5203-07a2-4d62-e053-1428330a829a.1']/*:Commissions/*:Commission/*:Amount //*:Coverage[@*:ID='941b5203-07a2-4d62-e053-1428330a829a.1']/*:Commissions/*:Commission/*:Rate //*:Coverage[@*:ID='941b5203-07a2-4d62-e053-1428330a829a.1']/*:AdditionalAttributes/*:Attribute[*:Name[. = 'Coverage_Family']]/*:Value //*:Coverage[@*:ID='941b5203-07a2-4d62-e053-1428330a829a.1']/*:CoverageDescription //*:Coverage[@*:ID='941b5203-07a2-4d62-e053-1428330a829a.1']/*:CoverageText
That is, they are absolute path expressions. I thought that the low hit rate might simply be because the cache is too small, but looking more carefully, of every 1000 calls to evaluate, about 750 of the expressions are new. What's happening is that they cycle: the
@*:ID value changes, and then the same subordinate paths are evaluated for a new ID. There are a couple of ways we could tackle this. First, we could make the ID value a parameter, so the cached expression becomes
//*:Coverage[@*:ID=$p]/*:CoverageText and we evaluate it with different values of
$p. Secondly, we could evaluate
//*:Coverage[@*:ID='941b5203-07a2-4d62-e053-1428330a829a.1']/ outside the
saxon:evaluate() call, and use the result as the context item for evaluating the relative path
*:BulkCoding/@*:Indicator (etc). Evaluating the common subexpression once might end up giving as big a saving as the caching itself; in fact it's entirely possible that it's this repeated whole-document searching for
Coverage elements by ID that accounts for most of the cost. Certainly that would be consistent with the JProfiler observations.
#7 Updated by Michael Kay 9 months ago
By adding comments to the start of the dynamic XPath expressions, I have established that the //*:Coverage expressions originate from line 277 in the XSLT source:
<xsl:for-each select="fns:evaluate(.,'(:D:)' || concat('//*:',$currentMapping/SrcChild,'[@*:ID=',$getID,']/',$currentMapping/SrcDataPath))">
Note that the paths we are evaluating appear to come from the data files MappingFile.xml or MappingFile_Enterprise.xml. The fact that we evaluate a sequence of paths having the same values for $currentMapping/SrcChild and $getID appears to be an accident of the data file, not something the code can rely on.
I'm thinking that (A) we could replace the
part of this by code to generate
fns:findID(., 'Coverage', $getID)/
where fns:findID uses a key to locate the required element.
and then (B) we could make the two variable parts here ("Coverage" and $getID) into xsl:evaluate parameters in order to improve the caching.
#8 Updated by Michael Kay 9 months ago
- Category set to Performance
- Assignee set to Michael Kay
With these changes, the execution time (on the large source document) is reduced from 37m to 21s.
Specifically, we've made three changes.
(a) We've switched from
xsl:evaluate to take advantage of compiled expression caching.
(b) In one particular case, we've used a parameterized expression where the variable part is supplied using with-param, in order to reduce the number of different XPath expressions and thus benefit further from caching of compiled expressions
(c) Rather than generate an expression starting
//*:XXX[ID='1234'], we've generated one that starts
fns:findById(., 'XXX', '1234'), where
fns:findById is a stylesheet function that uses key() to find the relevant element.
I haven't tried to quantify which of these changes contributes most to the improvement.
I haven't attempted to investigate the performance regression reported in the original bug report. I suspect it's caused by increased XPath compile time, caused by increased complexity in the XPath grammar in XPath 3.1 and/or more vigorous optimization attempts in Saxon.
Oh, and I haven't actually verified that the revised code is generating the correct output.
Please register to edit this issue