Project

Profile

Help

Bug #6090

closed

Performance regression SaxonCS 12

Added by Joel Kalvesmaki 10 months ago. Updated 5 months ago.

Status:
Closed
Priority:
Low
Assignee:
Category:
Performance
Sprint/Milestone:
-
Start date:
2023-06-23
Due date:
% Done:

100%

Estimated time:
Legacy ID:
Applies to branch:
12, trunk
Fix Committed on Branch:
12, trunk
Fixed in Maintenance Release:
Platforms:
.NET, Java

Description

The attached input and XSLT stylesheets show evidence for significant performance regression in SaxonCS, especially 12.

Method Environment Saxon prod Saxon Version Time (ms)
Oxygen java PE 9.9 50000
CLI java HE 10.6 46564
CLI .net EE 10.9 158244
CLI .net EE 11.5 264665
CLI .net EE 12.2 596690

I have determined that something significant is happening in the process from $pass-1b to $pass-2a. When outputting only $pass-2a, and commenting out the templates in mode pass-2a-bills with priorities 99 and 100 (lines 108, 122 in the pass2 file), results are faster, but the penalties under C# 12.2 are more pronounced:

Method Environment Saxon prod Saxon Version Time (ms)
Oxygen java PE 9.9 14900
CLI java HE 10.6 17206
CLI .net EE 10.9 51643
CLI .net EE 11.5 96078
CLI .net EE 12.2 206967

I did what I could to ensure that the tests were identical in all ways except the processor and environment, as noted above.

The regression is noticeable only in files megabytes in size. Smaller files are processed quite rapidly in SaxonCS.

Not sure if this situation is comparable to #5914 but I cite it for the sake of quick cross-reference.


Files

performance.zip (631 KB) performance.zip Joel Kalvesmaki, 2023-06-23 02:12
6201.PCF.xml (43.6 KB) 6201.PCF.xml Joel Kalvesmaki, 2023-07-05 22:07
6211.PCF.xml (74.2 KB) 6211.PCF.xml Joel Kalvesmaki, 2023-07-05 22:07
6651.PCF.xml (24.9 KB) 6651.PCF.xml Joel Kalvesmaki, 2023-07-05 22:07
6652.PCF.xml (48.9 KB) 6652.PCF.xml Joel Kalvesmaki, 2023-07-05 22:07
slax.rng (30.5 KB) slax.rng Joel Kalvesmaki, 2023-07-06 00:05
input-reduced.xml (467 KB) input-reduced.xml Matt Patterson, 2023-07-06 12:21
Actions #1

Updated by Vladimir Nesterovsky 10 months ago

For the reference, per my older request Performance of SaxonHE 10.6 vs 11.2 the regression has started after 10.6 even in java.

Actions #2

Updated by Michael Kay 10 months ago

Thanks for reporting it. You're clearly doing something that our performance tests don't cover!

Actions #3

Updated by Michael Kay 10 months ago

I'm sorry, I've only just got around to investigating this (we decided to get 12.3 out first, and it took longer than it should have). And now I find that I can't actually run the stylesheet:

Error in expression in xsl:variable/@select on line 46 column 106 of slax2typesetting-pass1.xsl:
  FODC0002  I/O error reported by XML parser processing
  file:/Users/mike/bugs/2023/6090-Kalvesmaki/SlaxBuilders/formats/source%20as%20xml/6201.PCF.xml: /Users/mike/bugs/2023/6090-Kalvesmaki/SlaxBuilders/formats/source as xml/6201.PCF.xml (No such file or directory): /Users/mike/bugs/2023/6090-Kalvesmaki/SlaxBuilders/formats/source as xml/6201.PCF.xml (No such file or directory)

Could you supply the missing files please?

I'm probably going to start by investigating the performance on Java because we have a lot more tooling, instrumentation, and experience on that platform. I'll then move over to .NET to see if it's any different.

The fact that it's running 3 times slower on .NET than on Java is not something that greatly surprises us, we've seen the same effect ourselves, and it's true both of the old Saxon/.NET product built using IKVMC and the new SaxonCS product built by transpiling. We have no idea why it should be the case, despite much investigation.

In our own measurements the "elaboration" mechanism introduced in Saxon 12 gave a very small (0-5%) speed-up in Java, and a much larger (20%) speed-up in SaxonCS. Clearly you're not seeing that with this workload.

Actions #4

Updated by Joel Kalvesmaki 10 months ago

I'm not terribly surprised by the Java - C# difference, and I'm sure with some time the differential will start to diminish.

I think you'll find that for this input file, four of those PCF.xml files are requested, so I've attached them. I trust you'll reconfigure the directories as needed. Let me know if I can supply anything else helpful.

Actions #5

Updated by Michael Kay 10 months ago

Thanks. It's now looking for file:/Users/mike/bugs/2023/6090-Kalvesmaki/schemas/slax.rng

Actions #6

Updated by Joel Kalvesmaki 10 months ago

attached

Actions #7

Updated by Michael Kay 10 months ago

OK, thanks, now running the test successfully. First results:

SaxonJ-EE 12.3 - 156380ms
SaxonJ-EE 11.5 - 120141ms

Not as big a difference as you're reporting, but enough to be worth investigating.

Actions #8

Updated by Michael Kay 10 months ago

I've run both versions with -TP:profile.html and there's one difference in the profiles that immediately stands out: function slax:batch-replace is called 99 times under SaxonJ-11, but 101,858 times under SaxonJ-12. The function doesn't itself account for a significant amount of the processing time, but the difference in execution counts indicates an adverse change in the execution strategy adopted by the optimizer.

Actions #9

Updated by Matt Patterson 10 months ago

Reduced input file

Actions #10

Updated by Michael Kay 10 months ago

I added internal instrumentation and it appears that the anomaly in the counter for slax:batch-replace is spurious: there are actually 101K calls in both 11.x and 12.x, but the tracing for 11.x isn't counting "push-mode" function calls for some reason.

On a whim, we also tried running with the call on normalize-unicode() at line 834 of the pass-1 stylesheet removed; it made no significant difference.

Actions #11

Updated by Michael Kay 10 months ago

I have been comparing the SEF output for 11.x against 12.x

(To do this, I wrote a transformation that sorted the SEF into a canonical order and deleted attributes such as component ID that are allocated arbitrarily; then compared using the Oxygen XML diff tool).

I found very few significant differences. The only ones with apparent performance implications are:

(a) 11.x uses an indexed filter for the first predicate on line 634 of pass2.xsl (in template match="para"), 12.x doesn't.

(b) 12.x uses an indexed filter for the predicate on line 585 of pass2.xsl (in function slax:bill-leaderwork-key-item, which gets inlined into the calculation of the grouping key at line 592).

Judging from the -TP output, neither of these differences are on performance-critical paths.

The other structural differences in the SEF execution plans are mainly rearrangement of multiple predicates in a filter expression (or rewriting to a single predicate using an "and" operator) which seem unlikely to make a major difference.

Actions #12

Updated by Michael Kay 10 months ago

Running under Profiler, in 12.x I'm seeing a strong hotspot (33%) at ValueComparisonElaborator.elaborateForBoolean() and another (25%) at PullElaborator.elaborateForItem$1

After that we get NamedChildIterator.next() (11%), StringTool.getStringLength() (8%), and StringValue.equals() (7%)

The calls to ValueComparison are coming from FilterIterator.nonNumeric.matches() which it seems is failing to elaborate the predicate expression before evaluating it, causing repeated work (such as cardinality checking) each time the predicate is evaluated.

The calls to PullElaborator.elaborateForItem$1 happen in the course of ValueComparisonElaborator.elaborateForBoolean().

Actions #13

Updated by Michael Kay 10 months ago

So, why is ValueComparison not using the elaboration mechanism? Because someone is calling FilterExpression.iterate() without elaborating it first. This seems to originate with GeneralNodePattern.isSelected().

It's worth noting here that a GeneralNodePattern is bad news - it means that we've got a complex match pattern without a fast evaluation strategy. The pattern in question is para[@format eq '6651'][@loc eq '46'][string-length(.) gt 0][1] on line 306 of pass2.xsl. It's the final '[1]' that causes the problems. A better way of handling special cases like this would be to bind a global variable:

<xsl:variable name="para-6651-46-first" select="//para[@format eq '6651'][@loc eq '46'][string-length(.) gt 0][1]"/>

and then write the pattern as match="$para-6651-46-first". Saxon can't do this optimization because the pattern matches nodes in any document, not just the principal input document.

But that's a side note, our concern is with Saxon inefficiencies not with stylesheet inefficiencies.

The method GeneralNodePattern.isSelected() takes an expression as argument; the expression should be elaborated first. Unfortunately, though, the elaboration mechanism has never been fully implemented for patterns (as distinct from expressions). But we can do it on a case-by-case basis.

Actions #14

Updated by Michael Kay 10 months ago

I've changed FilterExpression.iterate() to elaborate the expression before evaluating it, and this is showing a substantial improvement: execution time down from 156380ms to 103073ms which is significantly better than the 11.x performance.

I think we can do even better by taking a look at GeneralNodePattern.

Actions #15

Updated by Joel Kalvesmaki 10 months ago

Thank you, these notes are very informative and helpful.

Actions #16

Updated by Michael Kay 10 months ago

Changing GeneralNodePattern to retain the pre-elaborated evaluator for the equivalent expression gives a further improvement to 89655ms.

Actions #17

Updated by Michael Kay 10 months ago

On SaxonCS 12 before applying the changes the timing (for me) is 1115.669s. With the changes, this comes down to 513.575s.

On SaxonCS 11 my timing for the same run is 613.894s

Actions #18

Updated by Michael Kay 10 months ago

I'm now looking further into the possibility of improving the match performance of a GeneralNodePattern.

This currently follows the algorithm in the spec: there's an equivalent expression EE, and to match a node N we search for an ancestor A such that evaluating EE with A as the context item returns a node-set that includes the target node N.

But if the pattern has the form para[@format='X'][last()] then we ought to be able to do better. We should be able to extract the predicate [@format='X'] as a precondition, and avoid the ancestor search if the precondition is false.

Moreover/Alternatively, if the equivalent expression is only looking at the child axis (as here), then the ancestor search can be truncated to only look at the immediate parent.

Actions #19

Updated by Michael Kay 10 months ago

In attempting to implement improvements, I found a bug: XSLTempate.register makes a copy M1 of the match pattern M (needed because it may be changed) and stores the copy M1 in the RuleManager. But it doesn't (at least not on all paths) save the copied version in the XSLTemplate.match field. Subsequently XSLTemplate.optimize() calls match.optimize() on the original pattern M to produce M2, but the RuleManager still holds M1 rather than M2. So any changes made during the optimize() call have no effect.

At any rate, this is what happens when deferred template rule compilation is set, which is the default in Saxon-EE.

I've changed XSLTemplate.optimize() to put the optimized pattern in the TemplateRuleEE object, but that's not enough: it also needs to go in the Rule object. Fortunately we can get from the TemplateRuleEE to the relevant Rule objects.

Actions #20

Updated by Michael Kay 10 months ago

I now get a weird error "Cannot validate a NOTATION without a namespace resolver". If I run it in the debugger, it changes to "Cannot validate a QName without a namespace resolver".

We seem to be evaluating the troublesome GeneralNodePattern here; I suspect the problem is that it hasn't been type-checked. No, that doesn't help.

It's failing in an internal function map:untypedContains() which attempts to see if an untypedAtomic value is present in a map, by converting to all the types that might be present in the map. This call results from optimization of a general comparison, specifically

(6211, 6212, 6213, 6214, 6411, 6412, 6413, 6414, 6311, 6313) = data(@format)

We tackle this by building a map and testing for contains(), but because data(@format) is untyped, we have to be prepared to convert to the target type. In fact all these values are strings, except the final one (6313) which is an integer. We are converting to every type that might be present in the map, in turn, and for some reason we think that this includes QNames and Notations.

When we first build the map, the type data (keyUType property) is correct (string|decimal) but when we come to do the lookup, it has been changed to a universal atomic type.

This is happening as a side-effect of a call on HashTrieMap.getItemType(), called by TypeChecker.staticTypeCheck().

There are a number of issues here. Firstly, we shouldn't really be calling getItemType() - it's intended for diagnostic use, and the result isn't particularly well defined because an item conforms to many different types. Secondly, we're computing it badly in the case of a map: if there's an integer and a string, we conclude that it's anyAtomicType, which is why we end up thinking the map might contain a QName.

I've changed HashTrieMap.getItemType() to form a UType union, which captures the type more precisely.

Actions #21

Updated by Michael Kay 10 months ago

With these changes, the test run is taking 9437ms, down from 156380ms.

Yes, I checked: the output is identical to the 11.x output.

Actions #22

Updated by Michael Kay 10 months ago

  • Category set to Performance
  • Status changed from New to Resolved
  • Assignee set to Michael Kay
  • Applies to branch 12, trunk added
  • Fix Committed on Branch 12, trunk added
  • Platforms Java added
Actions #23

Updated by Joel Kalvesmaki 10 months ago

Amazing, to be able to reduce performance time by more than an order of magnitude. Many thanks!

Actions #24

Updated by O'Neil Delpratt 5 months ago

  • Status changed from Resolved to Closed
  • % Done changed from 0 to 100
  • Fixed in Maintenance Release 12.4 added

Bug fix applied in the Saxon 12.4 maintenance release

Please register to edit this issue

Also available in: Atom PDF