Bug #5910: Double reporting of template execution in timing profile - Saxon - Saxonica Developer Community

Actions

Send by e-mail Copy link

Bug #5910

closed

Double reporting of template execution in timing profile

Added by Joel Kalvesmaki almost 2 years ago. Updated almost 2 years ago.

Status:

Closed

Priority:

Normal

Assignee:

Michael Kay

Category:

Diagnostics

Sprint/Milestone:

Start date:

2023-03-08

Due date:

% Done:

100%

Estimated time:

Legacy ID:

Applies to branch:

11, 12, trunk

Fix Committed on Branch:

11, 12, trunk

Fixed in Maintenance Release:

12.1

Platforms:

Java

Description

In performance testing on a particular XML + XSLT combination, I noticed it was taking roughly twice as long on the command line as it was in Oxygen. I did a profile and notice that the initial template mode matching the document was being visited twice in the command line version, but only once in Oxygen (which is running Saxon 10.6).

I replicated the anomaly with the simplest of XSLT files, and, with profiling on, found that Saxon EE 9.9.1.8 and 10.9 on the command line visited the document template mode only once. But for Saxon EE 11.5 and 12.0 the document template mode was visited twice.

I've confirmed the above on two different XML + XSLT combinations on the command line across the four products.

History
Notes
Property changes

Actions

Copy link

Updated by Martin Honnen almost 2 years ago

Did you profile in oXygen with 10 EE as well? Or was that HE perhaps?

I have now profiled the stylesheet

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  version="3.0"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  exclude-result-prefixes="#all"
  expand-text="yes">

  <xsl:mode on-no-match="shallow-copy"/>

  <xsl:template match="/">
    <xsl:copy>
      <xsl:apply-templates/>
    </xsl:copy>
  </xsl:template>
  
</xsl:stylesheet>

with 11.5 HE and EE and 10.9 HE and EE and indeed both EE versions show a count of 2 for matching / e.g.

module	line	instruction	count	average time (gross/ms)	total time (gross/ms)	average time (net/ms)	total time (net/ms)
sheet1.xsl	9	xsl:template /	2	4.218	8.436	2.136	4.272

while the both HE versions show a count of 1 for matching / e.g.

module	line	instruction	count	average time (gross/ms)	total time (gross/ms)	average time (net/ms)	total time (net/ms)
sheet1.xsl	9	xsl:template /	1	4.008	4.008	4.008	4.008

No idea why EE vs. HE means count 2 vs. count 1 for /, I guess Michael can explain that.

Actions

Copy link

Updated by Joel Kalvesmaki almost 2 years ago

In all cases I'm working with EE, because that's the product our shop has licensed and uses.

I tried to put together a simple example that I could paste right in the ticket, but the optimizations made that challenging to do. (I want to demonstrate that even with all optimizations on, something is not quite right.) I will send the original files under separate cover.

Actions

Copy link

Updated by Michael Kay almost 2 years ago

There is something wrong with the injection of calls to the timing profiler, such that the body of the template rule, instead of being wrapped in a single trace call

trace(bodyExpresion)

is being doubly-wrapped

trace(trace(bodyExpression))

Which means there are two entries recorded to the template, and two exits, resulting in double counting of the execution time.

I'm pretty sure the problem is specific to the -TP timiing profiler, and there is no issue when running with profiling off. But I haven't completely worked out what's going wrong. In particular I can't see any obvious reason for it being EE-specific.

Actions

Copy link

Updated by Joel Kalvesmaki almost 2 years ago

OK, so I guess two issues, one being overcounting in the profiling, and the other is something yet TBD.

Because I can't release the background materials under the stipulated terms of the support board, I've sent them separately.

Actions

Copy link

Updated by Michael Kay almost 2 years ago

The first call to inject tracing code happens in XSLTemplate.compileTemplateRule(). The second happens in TemplateRuleInitializer.init(). At first sight the second call seems unnecessary - but it's hard to be 100% sure. A safe fix would be to ensure that (for the TimingTraceListener at least) injecting a trace expression a second time has no effect.

This eliminates the double counting.

Note that the absolute timings when running from the command line are largely meaningless unless it's a task that takes 30 seconds or more, because it takes that long for JVP hotspot optimisation to settle down. It's mainly the counts that are useful, as well as relative timings.

Actions

Copy link

Updated by Michael Kay almost 2 years ago

Moving on from the profiling problem to the actual workload performance, I see the following results under SaxonJ. This is running with Java 11 from the command line, in each case with -t -repeat:5

9.9.1.8 - 20.107s
10.9 - 20.150s
11.5 - 21.518s
11.5 with bytecode disabled - 18.098s
12.0 - 34.231s

So there is indeed a performance regression here, which we need to investigate. Moreover, it has nothing to do with the fact that bytecode generation was dropped in 12.0 (on the contrary, the 11.5 results confirm that, for this workload, this was the right thing to do.)

Actions

Copy link

Updated by Michael Kay almost 2 years ago

Now some figures for SaxonCS. This is running from the command line, but within Rider on MacOS, and on the current development branch rather than released product. But the same options (-t -repeat:5)

SaxonCS 11 - 55.389s
SaxonCS 12 - 48.734s

These figures are not a great surprise: we know that the work we did on expression elaboration in 12.x gave much more benefit in C# than it did in Java. The ratio of SaxonCS to SaxonJ for the same workload is also not untypical. (A caveat is that we are measuring timings using different instrumentation APIs so the figures might not be directly comparable).

Actions

Copy link

Updated by Michael Kay almost 2 years ago

I'm moving discussion of the performance regression to a new issue #5914 so that we can track closure of the timing profile problem separately in this issue.

Actions

Copy link

Updated by Michael Kay almost 2 years ago

Subject changed from Double evaluation of document node on command line, Saxon 11+ to Double reporting of template execution in timing profile
Category set to Diagnostics
Status changed from New to In Progress
Assignee set to Michael Kay
Applies to branch 11, 12, trunk added
Platforms Java added

Actions

Copy link

#10

Updated by Michael Kay almost 2 years ago

Status changed from In Progress to Resolved
Fix Committed on Branch 11, 12, trunk added

Actions

Copy link

#11

Updated by O'Neil Delpratt almost 2 years ago

Status changed from Resolved to Closed
% Done changed from 0 to 100
Fixed in Maintenance Release 12.1 added

Bug fix applied in the Saxon 12.1 maintenance release.

Please register to edit this issue

Actions

Send by e-mail Copy link

Also available in: Atom PDF

Project

Profile

Help

Saxon

Bug #5910

Double reporting of template execution in timing profile

Updated by Martin Honnen almost 2 years ago

Updated by Joel Kalvesmaki almost 2 years ago

Updated by Michael Kay almost 2 years ago

Updated by Joel Kalvesmaki almost 2 years ago

Updated by Michael Kay almost 2 years ago

Updated by Michael Kay almost 2 years ago

Updated by Michael Kay almost 2 years ago

Updated by Michael Kay almost 2 years ago

Updated by Michael Kay almost 2 years ago

Updated by Michael Kay almost 2 years ago

Updated by O'Neil Delpratt almost 2 years ago