Bug #5359
closedXSLT Compiler performance regression 10.x to 11.x
0%
Description
We're seeing a performance regression of up to 20% or so between 10.6 and 11.2 for XSLT compilation.
Compiling the docbook-fo stylesheets, from the command line with -nogo, we're seeing an increase from 2206ms to 2645ms. A breakdown of this cost, obtained by setting the static variable Compilation.TIMING to true, is attached.
Files
Updated by Michael Kay over 2 years ago
The above figures are for a single-shot compilation invoked from the command line. The benefit of this metric is that it reflects the true user experience. The drawback is that most of the cost is in VM initialisation, over which we have relatively little control, and this gives timings that are not highly reproducible.
If we compile 10 times, the last run produces much lower figures:
Built stylesheet documents 79.909531ms
Preparing package 0.1218ms
spliceIncludes 1.244106ms
importSchemata 0.154236ms
buildIndexes 2.341494ms
checkForSchemaAwareness 0.317005ms
processAllAttributes 70.535205ms
collectNamespaceAliases 0.040325ms
fixupReferences 5.247349ms
validate 36.130623ms
Register output formats 0.171948ms
Index character maps 0.039407ms
Fixup 0.014465ms
Combine attribute sets 3.086203ms
fixup Query functions 0.03954ms
register templates 6.96748ms
adjust exposed visibility 0.257539ms
compile top-level objects (2843) 122.113383ms
typeCheck functions (0) 0.048421ms
optimize top level 270.942903ms
optimize functions 0.048434ms
check decimal formats 0.032129ms
build template rule tables 1.16201ms
build runtime function tables 0.14143ms
allocate binding slots to named templates 0.104624ms
allocate binding slots to component references 18.478679ms
allocate binding slots to key definitions 0.19737ms
allocate binding slots to accumulators 0.022687ms
inject byte code candidates 0.027319ms
total compile time 619.957037ms
Completion 0.54519ms
Streaming fallback 0.046841ms
While the single-shot figures are more "true to life" (because people don't actually compile the same stylesheet 10 times in a row), the "best of 10" figure may be more useful for analysis.
Note that the VM initialisation is not completely unconnected with what Saxon is doing: a fair chunk of it is initialization of static data used by the compiler. For example, I believe that most of the "preparing package" cost (second line item) is initialization of the data representing the built-in function library, and the reason this has increased between 10.x and 11.x may be because this library has grown. This would explain why the cost of this phase plummets close to zero on the second and subsequent compilations.
Updated by Michael Kay over 2 years ago
- File compile10vs11.pdf compile10vs11.pdf added
Updated by Michael Kay over 2 years ago
So I've attached a revised comparison, this time showing both the "first time" compilation cost, and the "best of 10" (actually "last of 10") figures, obtained by running with -repeat:10 on the command line. This data actually shows Saxon 11.x coming in faster than 10.x - though the initial tree-building phase for parsing the stylesheets is still slower (which might be expected since the docbook-fo stylesheet contain non-ASCII data and in 11.x we are taking the hit on detecting and expanding surrogate pairs during initial parsing rather than during subsequent processing).
If this is telling us anything, it's that we should perhaps be looking carefully at some of the static initialization cost, for example the cost of building the function libraries, and seeing if we can't do some of this lazily.
Updated by Michael Kay almost 2 years ago
- Status changed from New to Closed
In Saxon 12 we have in fact moved towards being more lazy in building the function libraries, though the effect is very minor (it was prompted by the need to avoid dynamic class instantiation in GraalVM).
This issue is now dormant so I'm closing it.
Please register to edit this issue