Project

Profile

Help

Bug #5910

closed

Double reporting of template execution in timing profile

Added by Joel Kalvesmaki about 1 year ago. Updated about 1 year ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
Diagnostics
Sprint/Milestone:
-
Start date:
2023-03-08
Due date:
% Done:

100%

Estimated time:
Legacy ID:
Applies to branch:
11, 12, trunk
Fix Committed on Branch:
11, 12, trunk
Fixed in Maintenance Release:
Platforms:
Java

Description

In performance testing on a particular XML + XSLT combination, I noticed it was taking roughly twice as long on the command line as it was in Oxygen. I did a profile and notice that the initial template mode matching the document was being visited twice in the command line version, but only once in Oxygen (which is running Saxon 10.6).

I replicated the anomaly with the simplest of XSLT files, and, with profiling on, found that Saxon EE 9.9.1.8 and 10.9 on the command line visited the document template mode only once. But for Saxon EE 11.5 and 12.0 the document template mode was visited twice.

I've confirmed the above on two different XML + XSLT combinations on the command line across the four products.

Actions #1

Updated by Martin Honnen about 1 year ago

Did you profile in oXygen with 10 EE as well? Or was that HE perhaps?

I have now profiled the stylesheet

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  version="3.0"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  exclude-result-prefixes="#all"
  expand-text="yes">

  <xsl:mode on-no-match="shallow-copy"/>

  <xsl:template match="/">
    <xsl:copy>
      <xsl:apply-templates/>
    </xsl:copy>
  </xsl:template>
  
</xsl:stylesheet>

with 11.5 HE and EE and 10.9 HE and EE and indeed both EE versions show a count of 2 for matching / e.g.

module	line	instruction	count	average time (gross/ms)	total time (gross/ms)	average time (net/ms)	total time (net/ms)
sheet1.xsl	9	xsl:template /	2	4.218	8.436	2.136	4.272

while the both HE versions show a count of 1 for matching / e.g.

module	line	instruction	count	average time (gross/ms)	total time (gross/ms)	average time (net/ms)	total time (net/ms)
sheet1.xsl	9	xsl:template /	1	4.008	4.008	4.008	4.008

No idea why EE vs. HE means count 2 vs. count 1 for /, I guess Michael can explain that.

Actions #2

Updated by Joel Kalvesmaki about 1 year ago

In all cases I'm working with EE, because that's the product our shop has licensed and uses.

I tried to put together a simple example that I could paste right in the ticket, but the optimizations made that challenging to do. (I want to demonstrate that even with all optimizations on, something is not quite right.) I will send the original files under separate cover.

Actions #3

Updated by Michael Kay about 1 year ago

There is something wrong with the injection of calls to the timing profiler, such that the body of the template rule, instead of being wrapped in a single trace call

trace(bodyExpresion)

is being doubly-wrapped

trace(trace(bodyExpression))

Which means there are two entries recorded to the template, and two exits, resulting in double counting of the execution time.

I'm pretty sure the problem is specific to the -TP timiing profiler, and there is no issue when running with profiling off. But I haven't completely worked out what's going wrong. In particular I can't see any obvious reason for it being EE-specific.

Actions #4

Updated by Joel Kalvesmaki about 1 year ago

OK, so I guess two issues, one being overcounting in the profiling, and the other is something yet TBD.

Because I can't release the background materials under the stipulated terms of the support board, I've sent them separately.

Actions #5

Updated by Michael Kay about 1 year ago

The first call to inject tracing code happens in XSLTemplate.compileTemplateRule(). The second happens in TemplateRuleInitializer.init(). At first sight the second call seems unnecessary - but it's hard to be 100% sure. A safe fix would be to ensure that (for the TimingTraceListener at least) injecting a trace expression a second time has no effect.

This eliminates the double counting.

Note that the absolute timings when running from the command line are largely meaningless unless it's a task that takes 30 seconds or more, because it takes that long for JVP hotspot optimisation to settle down. It's mainly the counts that are useful, as well as relative timings.

Actions #6

Updated by Michael Kay about 1 year ago

Moving on from the profiling problem to the actual workload performance, I see the following results under SaxonJ. This is running with Java 11 from the command line, in each case with -t -repeat:5

9.9.1.8 - 20.107s
10.9 - 20.150s
11.5 - 21.518s
11.5 with bytecode disabled - 18.098s
12.0 - 34.231s

So there is indeed a performance regression here, which we need to investigate. Moreover, it has nothing to do with the fact that bytecode generation was dropped in 12.0 (on the contrary, the 11.5 results confirm that, for this workload, this was the right thing to do.)

Actions #7

Updated by Michael Kay about 1 year ago

Now some figures for SaxonCS. This is running from the command line, but within Rider on MacOS, and on the current development branch rather than released product. But the same options (-t -repeat:5)

SaxonCS 11 - 55.389s
SaxonCS 12 - 48.734s

These figures are not a great surprise: we know that the work we did on expression elaboration in 12.x gave much more benefit in C# than it did in Java. The ratio of SaxonCS to SaxonJ for the same workload is also not untypical. (A caveat is that we are measuring timings using different instrumentation APIs so the figures might not be directly comparable).

Actions #8

Updated by Michael Kay about 1 year ago

I'm moving discussion of the performance regression to a new issue #5914 so that we can track closure of the timing profile problem separately in this issue.

Actions #9

Updated by Michael Kay about 1 year ago

  • Subject changed from Double evaluation of document node on command line, Saxon 11+ to Double reporting of template execution in timing profile
  • Category set to Diagnostics
  • Status changed from New to In Progress
  • Assignee set to Michael Kay
  • Applies to branch 11, 12, trunk added
  • Platforms Java added
Actions #10

Updated by Michael Kay about 1 year ago

  • Status changed from In Progress to Resolved
  • Fix Committed on Branch 11, 12, trunk added
Actions #11

Updated by O'Neil Delpratt about 1 year ago

  • Status changed from Resolved to Closed
  • % Done changed from 0 to 100
  • Fixed in Maintenance Release 12.1 added

Bug fix applied in the Saxon 12.1 maintenance release.

Please register to edit this issue

Also available in: Atom PDF