XSLT Compiler performance regression 10.x to 11.x
We're seeing a performance regression of up to 20% or so between 10.6 and 11.2 for XSLT compilation.
Compiling the docbook-fo stylesheets, from the command line with -nogo, we're seeing an increase from 2206ms to 2645ms. A breakdown of this cost, obtained by setting the static variable Compilation.TIMING to true, is attached.
Updated by Michael Kay 11 months ago
The above figures are for a single-shot compilation invoked from the command line. The benefit of this metric is that it reflects the true user experience. The drawback is that most of the cost is in VM initialisation, over which we have relatively little control, and this gives timings that are not highly reproducible.
If we compile 10 times, the last run produces much lower figures:
Built stylesheet documents 79.909531ms Preparing package 0.1218ms spliceIncludes 1.244106ms importSchemata 0.154236ms buildIndexes 2.341494ms checkForSchemaAwareness 0.317005ms processAllAttributes 70.535205ms collectNamespaceAliases 0.040325ms fixupReferences 5.247349ms validate 36.130623ms Register output formats 0.171948ms Index character maps 0.039407ms Fixup 0.014465ms Combine attribute sets 3.086203ms fixup Query functions 0.03954ms register templates 6.96748ms adjust exposed visibility 0.257539ms compile top-level objects (2843) 122.113383ms typeCheck functions (0) 0.048421ms optimize top level 270.942903ms optimize functions 0.048434ms check decimal formats 0.032129ms build template rule tables 1.16201ms build runtime function tables 0.14143ms allocate binding slots to named templates 0.104624ms allocate binding slots to component references 18.478679ms allocate binding slots to key definitions 0.19737ms allocate binding slots to accumulators 0.022687ms inject byte code candidates 0.027319ms total compile time 619.957037ms Completion 0.54519ms Streaming fallback 0.046841ms
While the single-shot figures are more "true to life" (because people don't actually compile the same stylesheet 10 times in a row), the "best of 10" figure may be more useful for analysis.
Note that the VM initialisation is not completely unconnected with what Saxon is doing: a fair chunk of it is initialization of static data used by the compiler. For example, I believe that most of the "preparing package" cost (second line item) is initialization of the data representing the built-in function library, and the reason this has increased between 10.x and 11.x may be because this library has grown. This would explain why the cost of this phase plummets close to zero on the second and subsequent compilations.
Updated by Michael Kay 11 months ago
So I've attached a revised comparison, this time showing both the "first time" compilation cost, and the "best of 10" (actually "last of 10") figures, obtained by running with -repeat:10 on the command line. This data actually shows Saxon 11.x coming in faster than 10.x - though the initial tree-building phase for parsing the stylesheets is still slower (which might be expected since the docbook-fo stylesheet contain non-ASCII data and in 11.x we are taking the hit on detecting and expanding surrogate pairs during initial parsing rather than during subsequent processing).
If this is telling us anything, it's that we should perhaps be looking carefully at some of the static initialization cost, for example the cost of building the function libraries, and seeing if we can't do some of this lazily.
Updated by Michael Kay about 2 months ago
- Status changed from New to Closed
In Saxon 12 we have in fact moved towards being more lazy in building the function libraries, though the effect is very minor (it was prompted by the need to avoid dynamic class instantiation in GraalVM).
This issue is now dormant so I'm closing it.
Please register to edit this issue