Bug #5914


Performance regression in SaxonJ 12

Added by Michael Kay 13 days ago. Updated 2 days ago.

Start date:
Due date:
% Done:


Estimated time:
Legacy ID:
Applies to branch:
12, trunk
Fix Committed on Branch:
Fixed in Maintenance Release:


Issue #5910 reported two issues: a performance regression and a problem with the timing profile output. So that we can track closure of the two issues separately, I'm moving the performance issue here.

Actions #1

Updated by Michael Kay 2 days ago

Following a hunch based on profiling of another workload that was showing poor performance, I investigated the method getCurrentMergeGroupIterator() which is called on every template call. I found that this was searching back through the context property chain and that the search could become extremely long (max'ing out at around 45000 iterations). This code is new in 12.x and it seems very likely that it's the cause of the problem.

We made a design change in this area after noticing that the cost of copying all properties of the dynamic context on each template/function call could become quite significant, in particular it involves a lot of object allocation which in turn increases costs of garbage collection. Most of the properties of the context remain unchanged on a template call, so the idea was that searching back through the chain of changed values would be faster than copying an unchanged value. The problem here seems to be that as a result of deep recursion, the property chain can become quite long, and searching for a property that has never been set (like the current merge group) will search the whole chain.

Actions #2

Updated by Michael Kay 2 days ago

Even searches for values that turn out non-null can potentially become long. The current template rule is being set 1_238_385 times, while the current mode is only set 65_823 times, which means a search for the current mode might have to skip 100 or so current template rule property records.

A solution might be to condense the property chain if it becomes too long.

Actions #3

Updated by Michael Kay 2 days ago

Condensing the property chain when it reaches a length of 100 brings execution time down from 37s to 24s - roughly the amount of the performance regression reported. Reducing it to 50 gives no further improvement.

In fact, I'm seeing a timing of 35s for this job on SaxonJ 11.5.

Actions #4

Updated by Michael Kay 2 days ago

Further information about the problem.

The specs say that the current merge group is cleared on function calls and template invocation (ยง15.6: All invocation constructs set the current merge group and current merge key to absent.) We therefore make quite a few calls to context.setCurrentMergeGroupIterator(null). The code is written to do nothing if the value is already null. But determining that it is null typically involves searching the property chain for a value, and in a stylesheet that doesn't use xsl:merge this will always search to the end. If the chain contains many other properties, which will often be the case if there is a deep level of recursion, this may be a lengthy search.

We could question whether the tiny saving achieved by changing the design in 12.0 is actually worth it. In 11.x, setting the current merge group iterator to null was a simple field assignment, and copying it when a new context is created was hardly going to be expensive. But overall, the reduction in the size (number of fields) in the XPathContextMajor object, and therefore the reduction in the cost of copying it, did show up in measurements as a benefit.

A change we could consider is to revert to holding the current template rule the old way, as a dedicated field. This is the only property that is likely to be set very frequently, and in cases I've looked at, it completely dominates the property chain. If we saved this in a dedicated field, it's very unlikely that the property chain would grow to thousands of entries, and searches for all the other properties would then be much shorter.

A more radical question is this: do we actually need to keep a full stack of Context objects through tail-recursive template calls? When a template call is tail-recursive, we reuse the Java stackframe of the caller to avoid a Java stack overflow, but we don't attempt to re-use (or discard) the context object for the caller. It seems that we do reuse the Context for function tail calls but not for call-template or apply-templates tail calls. There's a TODO in the CallTemplate code pointing out that I tried to fix this in June 2022 and hit difficulties. Perhaps I need to try again.

Please register to edit this issue

Also available in: Atom PDF