Bug #2679

XSLT transformation is taking 15% more on 9.7.0-3 vs 9.6.0-1

Added by Guillermo Fernandes over 4 years ago. Updated over 4 years ago.

Start date:
Due date:
% Done:


Estimated time:
Legacy ID:
Applies to branch:
Fix Committed on Branch:
Fixed in Maintenance Release:



We have upgraded from 9.6.0-1 to 9.7.0-3 and while running our performance test for a simple XSLT transformation we have noticed that the same test is taking almost 15% more time to complete the transformations.

The transformation is very simple what we do is we have the following input:

<soapenv:Envelope xmlns:soapenv="" xmlns:xsd="http://services.samples/xsd">

These input file has 6007 lines, the idea of the test is to do a reverse of each tag and then set it back again.

So we do a transformation of this input with the following xslt

<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="2.0" xmlns:xsl="" xmlns:m0="http://services.samples/xsd" xmlns:soapenv="">
<xsl:output method="xml" omit-xml-declaration="yes" indent="no" exclude-result-prefixes="m0"/>

<xsl:template match="m0:buyStocks">

<soapenv:Envelope xmlns:soapenv="">
<m:skcotSyub xmlns:m="http://services.samples/xsd">
<xsl:for-each select="order">
<redro><lobmys><xsl:value-of select="symbol"/></lobmys><DIreyub><xsl:value-of select="buyerID"/></DIreyub><ecirp><xsl:value-of select="price"/></ecirp><emulov><xsl:value-of select="volume"/></emulov></redro>
<xsl:template match="soapenv:Header"></xsl:template>

And then revert it with the following xslt:

<?xml version="1.0" encoding="ISO-8859-1"?>

<xsl:stylesheet version="2.0" xmlns:xsl="" xmlns:m0="http://services.samples/xsd">
<xsl:output method="xml" omit-xml-declaration="yes" indent="no" exclude-result-prefixes="m0"/>
<xsl:template match="m0:skcotSyub">

<soapenv:Envelope xmlns:soapenv="">
<m:buyStocks xmlns:m="http://services.samples/xsd">
<xsl:for-each select="redro">
<order><symbol><xsl:value-of select="lobmys"/></symbol><buyerID><xsl:value-of select="DIreyub"/></buyerID><price><xsl:value-of select="ecirp"/></price><volume><xsl:value-of select="emulov"/></volume></order>

We run it 50 concurrent threads for 180 seconds and we got a TPS of ~1050 with 9.6.0-1 or 9.6.0-8 and TPS of ~855 with 9.7.0-3 and 9.7.0-4.

We are running the profiler in order to understand what were the changes introduced since 9.6.0-1 to 9.7.0-3 but couldn't find something in particular. Just that net.sf.saxon.expr.instruct.ElementCreator.processLeavingTail() is taking now more time than previous version.

How come 9.7 is so much slower than 9.6 on that code?


testSoapEnvelope.xml (161 KB) testSoapEnvelope.xml Guillermo Fernandes, 2016-03-17 21:07


#1 Updated by Guillermo Fernandes over 4 years ago

Adding input file used to test the transformation.

#2 Updated by Michael Kay over 4 years ago

  • Priority changed from High to Normal

I've created a test without any concurrency, running the pair of transformations 1000 times. It's taking around 4100ms +/- 200 on 9.6, and about 4600 +/- 200 on 9.7

That's a fairly small difference as these things go, which means it may be difficult to pin down the cause, but we'll look into it.

It's worth pointing out that whenever we make a change in the optimization strategy there is a risk that some workloads will benefit and others will suffer. So the fact that one particular workload is marginally slower is not necessarily evidence that anything is wrong.

#3 Updated by Michael Kay over 4 years ago

  • Status changed from New to In Progress

I've taken Java profiles of the two runs and as might be expected, they are extremely similar (modulo the variation that you would expect from random sampling). The one possible clue to be investigated is that the 9.7 profile shows 18 hits on ExpressionTool.lazyEvaluate() which does not appear at all in the 9.6 profile.

The call on lazyEvaluate() is happening as part of a function call on string-join(), which is part of the execution plan generated for an xsl:value-of instruction. The way calls on system functions are organized has changed significantly in 9.7 because of the introduction of higher order functions; to avoid having separate implementations of each system function for static and dynamic calls, the code for static calls now uses essentially the same code path as a dynamic invocation. So this looks like an area to investigate.

#4 Updated by Guillermo Fernandes over 4 years ago

Hi Michael,

Thank you for your answers and taking a look into this.

I understand, you are right about the optimisation strategies, it is clear that not all the scenarios would be improved.

I will check in my environment if I got the same results from profiling it about ExpressionTool.lazyEvaluate() and let you know.

#5 Updated by Michael Kay over 4 years ago

Tracing the execution in the debugger for both releases, it does appear that the call overhead for string-join (in fact, for system function calls in general) has increased. The actual string-join logic is unchanged, but instead of passing the first argument as a simple atomizing iterator, we are creating a Closure object; moreover, we are going through a lot of decision-making on each call before deciding that passing a Closure is the best thing to do. In addition (but probably less significant) the changes to make the expression tree more generic are making the process of iterating over the function arguments less efficient.

#6 Updated by Michael Kay over 4 years ago

So far my measurements have been using the Saxon-HE code path. I thought it would be interesting to see what the equivalent EE path would be. Here are the results:

9.7 HE 5318ms EE 2774ms

9.6 HE 4824ms EE 2989ms

So although HE has got slower, EE has got faster! This no doubt explains why we failed to notice any regression in our testing, because our performance testing is typically focussed on Saxon-EE. The difference is probably due mainly to bytecode generation. When bytecode generation is in use, static calls to commonly-used functions such as string-join are expanded inline, leading to a zero call overhead.

#7 Updated by Michael Kay over 4 years ago

  • Status changed from In Progress to Resolved
  • Fix Committed on Branch 9.7 added

I've created a patch which overrides the method evaluateArguments() for a SystemFunctionCall. Instead of going into the general-purpose ExpressionTool.lazyEvaluate() method to compute each argument, it simply does

new LazySequence(arg[i].iterate(context))

A LazySequence is a simple wrapper around an iterator; it is much cheaper than a Closure because it can only be evaluated once (we rely on the implementation of system functions to only read each supplied argument value once). Closures are needed when computing the arguments for a user function because we don't know what the user function will do with the parameter, but for system functions an iterator suffices.

This appears to bring the cost down to around the same as the 9.6 figure (sometimes better).

#8 Updated by Michael Kay over 4 years ago

The patch caused a crash in format-number() which evaluates one of its arguments twice (for no good reason). Fixed this with a further patch.

#9 Updated by Michael Kay over 4 years ago

The patch also crashed fn:random-number-generator, which was wrongly assuming that the first argument would always be passed as an instance of AtomicValue. Fixed.

#10 Updated by O'Neil Delpratt over 4 years ago

  • Status changed from Resolved to Closed
  • % Done changed from 0 to 100
  • Fixed in Maintenance Release added

Bug fix applied in the Saxon maintenance release.

#11 Updated by O'Neil Delpratt over 4 years ago

  • Sprint/Milestone set to

Please register to edit this issue

Also available in: Atom PDF