Project

Profile

Help

Bug #2678

closed

Decreased performance for xslt parser

Added by Olga Abramovich about 8 years ago. Updated almost 8 years ago.

Status:
Closed
Priority:
Normal
Category:
-
Sprint/Milestone:
-
Start date:
2016-03-16
Due date:
% Done:

0%

Estimated time:
Legacy ID:
Applies to branch:
9.6
Fix Committed on Branch:
Fixed in Maintenance Release:
Platforms:

Description

We are starting to implement MSXML XSLT Parser replacement with Saxon XSLT Parser. We seeing that Saxon performance is 6 times slower than MSXML XSLT parser. Version 6.6 Saxon EE. .Net version 4.5. please let us know how we can improve the performance.


Files

DA17C2.XML (2.43 KB) DA17C2.XML Olga Abramovich, 2016-03-16 19:32
PL13P13X.xsl (38.4 KB) PL13P13X.xsl Olga Abramovich, 2016-03-16 19:32
Actions #1

Updated by Olga Abramovich about 8 years ago

Currently we use Antenna House and MSXSL 4.0 for FO transformation.

Now we have changed the MSXSL 4.0 to Saxon for FO transformation. Now we see a decrease in performance by 6 times while using Saxon.

This is how we invoke Saxon from Antenna House

D:\Program Files\Saxonica\SaxonEE9.6N\bin\Transform.exe -s:%1 -xsl:%2 -o:%3

Actions #2

Updated by Michael Kay about 8 years ago

  • Assignee set to O'Neil Delpratt
  • Priority changed from High to Normal

We'll take a look at it, but here are some preliminary observations.

I note that you are running essentially from the command line. This gives a heavy start-up cost while the .NET and Java run-time environment initializes itself. For single-shot execution from the command line, it's very difficult for a .NET application to compete with native code like MSXML. The Saxon Java product is faster than the .NET product - unless you are using extension functions written in C#, there is no particular reason to use the .NET version in preference to Java.

Another thought is that the performance might be dominated by compile-time - Saxon-EE puts in a lot of work at compile time to optimize execution speed, and with a very simple stylesheet like this that can sometimes achieve little benefit. We do know from previous studies that while Saxon's run-time performance is very competitive, its compile time is sometimes slower than other products. Switching off optimization and bytecode generation is often worthwhile if you are only executing the stylesheet once. Also Saxon 9.7 introduces a new option to save the compiled stylesheet on disk so you only incur the compilation cost once, even when running from the command line.

Actions #3

Updated by O'Neil Delpratt about 8 years ago

Following from our investigations there are a number of points that might be helpful for you:

    1. As mentioned in comment #1 the bytecode generation feature is imposing a cost but giving no benefit due to there being one template which is quite long. This generates a method in bytecode which is very long.

On .NET with bytecode switched on the compilation time takes 2.3 seconds with a running time of 422ms. With bytecode switched off the compilation time is 1.2 second with a running time of 172ms. Therefore it is best to switch it off. You can do this by using the command: --generateByteCode:off

    1. We then compared Saxon on Java to Saxon on .NET. As mentioned in comment #1 Saxon on Java is much faster sometimes a factor of 2-3 times. With your stylesheet and source document and with bytecode feature off here are the perform times of the execution on the two platforms:

Java:

Compilation= 1017ms

Execution= 58ms

.NET:

Compilation= 1235ms

Execution= 172ms

We observe that Saxon on Java is clearly faster by almost a factor of 3. If you are only running from the command line and it is possible to use Java we recommend it.

    1. Compile time vs run time: On Saxon 9.7 you can try to reduce the compile time by using the export feature (i.e. on the command line add the option -export:filename.xsltp). Then use the compiled form in the -xsl option. Here you export the compiled stylesheet, in a form suitable for subsequent execution. We noticed that this gave little benefit possibly because the file size of the compiled stylesheet is significantly larger than the original.
    1. In our observation of your stylesheet we noticed the repetitive use of the xsl:variable with an xsl:value as a child. The effect is when the variables are created it has to create a node tree for each one, which is expensive. It is more efficient to use xsl:variable with a select attribute. The Saxon optimizer tries to do the conversion internally but seems not to have succeeded in this case. We will try to investigate further as to why.
Actions #4

Updated by O'Neil Delpratt about 8 years ago

  • Status changed from New to AwaitingInfo
Actions #5

Updated by Michael Kay about 8 years ago

As regards the question of variable declarations: consider

<xsl:variable name="Pos1Image">
        <xsl:value-of select="POSITION_1"/>
</xsl:variable>

versus

<xsl:variable name="Pos1Image" select="POSITION_1"/>

Semantically, the second form is much simpler: it just binds the variable to an existing node, whereas the first form creates a result tree fragment, consisting of a document node with a child text node, whose content is a copy of the text in the original selected node. That's obviously much more expensive.

Because this is such a common mistake, Saxon tries to optimize the first form to the equivalent of the second, but it can only do this under certain conditions, because they aren't 100% equivalent. I tried changing the version number on xsl:stylesheet to 2.0 and this causes the optimization to kick in. I'm not entirely sure why this should make a difference; but the semantics of backwards compatibility mode (which is in force if the version is set to 1.0 or (incorrectly) to 1.1) are sufficiently complex that we probably don't attempt the analysis in this case.

I haven't tried to measure the effect this has: because the execution time for this stylesheet is dominated by compile time, it probably doesn't make much difference to the bottom line.

Actions #6

Updated by Michael Kay about 8 years ago

I was disappointed that loading from an exported stylesheet in 9.7 appears to make little difference. Analyzing the figures, with "conventional loading" I get a compilation time of 890ms, of which 121ms is spent in the XML parser; while with "exported package" loading I'm seeing a "compilation" time of 659ms, of which 176ms is in parsing the (exported) stylesheet file, and 379ms is in rebuilding the expression tree from the output of the XML parser. If I add -repeat:20 to the command line, to force repeated execution, I find that these times come down to something like:

Conventional compilation: Stylesheet compilation time: 167.134ms (of which Stylesheet parse time 5ms)

"Exported" compilation: Stylesheet compilation time: 102.36ms, of which:

Stylesheet parse time 24ms

Package load time 61ms

What this illustrates is that with single-shot execution from the command line, the dominant cost is actually Java warm=up time: in effect, performance is dominated by the cost of loading Saxon classes into the Java VM.

I made these measurements on the Java product, but the effect is going to be very similar on .NET. Essentially, if you're running an appllication that only takes 50ms to execute, then initializing an environment like the .NET or Java VM to run it is a big overhead, and there's nothing that we can do in Saxon, or that you can do in your XSLT code, to reduce this overhead.

There may be things you can do at application level to reduce this overhead, typically by batching up transformations in such a way that the start-up cost is amortized over many transformations.

Actions #7

Updated by Olga Abramovich about 8 years ago

Thank you for the great reply. I want to clarify couple points.

We are running Antenna House Formatter version 6.3. Antenna House formatter calls the MSXML or Saxon for FO transformation. This is how we invoke Saxon from Antenna House

D:\Program Files\Saxonica\SaxonEE9.6N\bin\Transform.exe -s:%1 -xsl:%2 -o:%3

We will check with the Antenna House too.

We also going to change the variable declaration as suggested here. Will let you know the output.

Thank you.

Actions #8

Updated by O'Neil Delpratt almost 8 years ago

  • Status changed from AwaitingInfo to Closed

Hi,

If you get anymore information please don't hesitate to reopen this bug issue.

Please register to edit this issue

Also available in: Atom PDF