Project

Profile

Help

Bug #5686

closed

Heap size problems, and crash in TraceListener

Added by Michael Kay over 1 year ago. Updated about 1 year ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
-
Sprint/Milestone:
-
Start date:
2022-09-13
Due date:
% Done:

100%

Estimated time:
Legacy ID:
Applies to branch:
11, trunk
Fix Committed on Branch:
11, trunk
Fixed in Maintenance Release:
Platforms:
.NET, Java

Description

Problem reported by email.

To be exact, I have two issues tightly related to each other. First of all I have a transformation (step) where for some element nodes an attribute "variant" must be produced with the help of a function (See test-set at "lib/EDK/Scripts/XSL/PublishingCore/mod.filter.xsl", line 123.).

(1) One problem is that the underlying stack of the function called (exf:get-variant#1) seems to have a bottleneck somewhere as the heap space consumption seems to grow exponentially with the input data size. When the heap space is getting to its limits, the Java garbage collector is taking a lot of CPU time so that the transformation cannot really finish with my configuration.

The XSLT code is lacking maintenance here and there, but it's not clear what the main reason for the symptom described above exactly is. So the question is if you could give me a hint where this bottleneck may be located in the scripts? It's not the first time I have such an issue but here I have no idea how to isolate the problem.

I already tested with Saxon PE 9.9, 10.8, and 11.4; Saxon PE 9.9 is currently used in production by our clients.

(2) The second issue is that I tried out the TimingTraceListener via commandline option "-T:net.sf.saxon.trace.TimingTraceListener" and also "-TP:profile.html" without success. I always get a "NullPointerException" like this one:

java.lang.NullPointerException: Cannot invoke "net.sf.saxon.expr.Expression.process(net.sf.saxon.event.Outputter, net.sf.saxon.expr.XPathContext)" because "this.body" is null
        at net.sf.saxon.expr.instruct.TemplateRule.applyLeavingTail(TemplateRule.java:394) 
        at net.sf.saxon.trans.Mode.applyTemplates(Mode.java:551)
        at net.sf.saxon.trans.rules.TextOnlyCopyRuleSet.process(TextOnlyCopyRuleSet.java:72)
        at net.sf.saxon.trans.Mode.applyTemplates(Mode.java:524)
        at net.sf.saxon.trans.XsltController.applyTemplates(XsltController.java:685)
        at net.sf.saxon.event.TransformerReceiver.close(TransformerReceiver.java:124)
        at net.sf.saxon.event.RegularSequenceChecker.close(RegularSequenceChecker.java:199)
        at net.sf.saxon.event.ComplexContentOutputter.close(ComplexContentOutputter.java:736)
        at net.sf.saxon.event.PushToReceiver$DocImpl.sendEndEvent(PushToReceiver.java:219)
        at net.sf.saxon.event.PushToReceiver$ContainerImpl.close(PushToReceiver.java:135)
        at net.sf.saxon.trace.TimingTraceListener.close(TimingTraceListener.java:150)
        at net.sf.saxon.trans.XsltController.callTemplate(XsltController.java:886)
       at net.sf.saxon.s9api.Xslt30Transformer.callTemplate(Xslt30Transformer.java:485)
        at net.sf.saxon.Transform.processFile(Transform.java:1382)
        at net.sf.saxon.Transform.doTransform(Transform.java:871)
        at net.sf.saxon.Transform.main(Transform.java:81)
java.lang.RuntimeException: Internal error evaluating template rule  at line 79 in module file:///C:/_/issues/2022-09-08-avm-hadria-iirds-performance/saxonica/profile.xsl
        at net.sf.saxon.expr.instruct.TemplateRule.applyLeavingTail(TemplateRule.java:411)
        at net.sf.saxon.trans.Mode.applyTemplates(Mode.java:551)
        at net.sf.saxon.trans.rules.TextOnlyCopyRuleSet.process(TextOnlyCopyRuleSet.java:72)
        at net.sf.saxon.trans.Mode.applyTemplates(Mode.java:524)
        at net.sf.saxon.trans.XsltController.applyTemplates(XsltController.java:685)
        at net.sf.saxon.event.TransformerReceiver.close(TransformerReceiver.java:124)
        at net.sf.saxon.event.RegularSequenceChecker.close(RegularSequenceChecker.java:199)
        at net.sf.saxon.event.ComplexContentOutputter.close(ComplexContentOutputter.java:736)
        at net.sf.saxon.event.PushToReceiver$DocImpl.sendEndEvent(PushToReceiver.java:219)
        at net.sf.saxon.event.PushToReceiver$ContainerImpl.close(PushToReceiver.java:135)
        at net.sf.saxon.trace.TimingTraceListener.close(TimingTraceListener.java:150)
        at net.sf.saxon.trans.XsltController.callTemplate(XsltController.java:886)
        at net.sf.saxon.s9api.Xslt30Transformer.callTemplate(Xslt30Transformer.java:485)
        at net.sf.saxon.Transform.processFile(Transform.java:1382)
        at net.sf.saxon.Transform.doTransform(Transform.java:871)
        at net.sf.saxon.Transform.main(Transform.java:81)
Caused by: java.lang.NullPointerException: Cannot invoke "net.sf.saxon.expr.Expression.process(net.sf.saxon.event.Outputter, net.sf.saxon.expr.XPathContext)" because "this.body" is null
        at net.sf.saxon.expr.instruct.TemplateRule.applyLeavingTail(TemplateRule.java:394)
        ... 15 more
Fatal error during transformation: java.lang.RuntimeException: Internal error evaluating template rule  at line 79 in module file:///C:/_/issues/2022-09-08-avm-hadria-iirds-performance/saxonica/profile.xsl

The exception also is reproducible with Saxon 9.9, 10.8, and 11.4.

(3) I put together a test-set with a powershell script "transform.ps1" that can simply be executed with ".\transform.ps1". If you are on Unix or Linux you should be easily be able to read the script (maybe it runs on Powershell Core) and adapt for bash/sh.

Please treat the test-set as confidential.

Preqrequisites: Adapt "$Global:JavaHome" and "$Global:SaxonClassPath" Change "$xmlIn" to transform a minimized or a complete test file (see following)

Input data: "data/Export.debug/P_01_de-DE/Work/in.xml" Complete input file that is suitable for reproducing the bottleneck issue (I realized the heap space thing when setting Java max heap to 10G) "data/Export.debug/P_01_de-DE/Work/in.min.xml" Reduced input file that is suitable for reproducing the profiling error

Actions #1

Updated by Michael Kay over 1 year ago

I succeeded in running this, and I see the same growth in heap size. I've taken a heap dump which I'm examining using JProfiler.

The heap is dominated by the usual byte[], int[], String, and HashMap$Node objects. The most immediately striking feature is that there are a large number (330K) of TinyTree instances on the heap. Most of these are quite small (888 bytes retained size), and they appear to be referenced from FocusTrackingIterator instances.

It looks as if the trees might be retained in memory as a result of using memo functions.

Actions #2

Updated by Michael Kay over 1 year ago

Tracing entry to MemoFunction.java shows we are building a big memo function cache for a number of functions, notably

<xsl:function name="filter:has-name" as="xs:boolean?" new-each-time="false">
        <xsl:param name="applic-item" as="element(applic)?"/>
        <xsl:param name="name" as="xs:string?"/>
        <xsl:sequence select="filter:has-some-name($applic-item, $name)"/>
    </xsl:function>

If I remove the "new-each-time", I haven't yet got the transformation to complete, but the heap usage seems to have stabilised.

This raises questions about the Saxon design in this area. The attribute new-each-time="false" means that if the function should behave like doc() - if it's called twice with the same arguments, it must return the same result. To achieve this, Saxon remembers the result of all previous calls,

The example function above in fact doesn't pose too much of a problem, because the HashMap used to cache the data has a string key and a boolean result. In fact, the key is always a string (we use the generate-id() of a node in cases where an argument is a node). But the associated value in the HashMap is the actual function result, and in some cases this is an element node. The causes the entire tree containing this element node to be locked in memory for the rest of the run. I think this is necessary to respect the semantics of new-each-time="false".

Actions #3

Updated by Michael Kay over 1 year ago

Now investigating the crash in TimingTraceListener, which is caused by a template rule in the compiled profile.xsl stylesheet having a null body. This appears to be associated with the fact that it's using a ProfessionalConfiguration, as requested in the configuration file. (However, changing the config.xml file to request an EnterpriseConfiguration seems to result in a different NPE in a different place.)

If we run without a configuration file, the profile.html file is generated successfully.

Actions #4

Updated by Michael Kay over 1 year ago

The template rule has no body because XSLTemplate.compileDeclaration() decided to use deferred (JIT) compilation of the template body, but this isn't supposed to happen under Saxon-PE, because it's an EE feature.

compilation.isDeferredCompilation() returned true because compilation.isPreScan() is true, which in turn is because OptimizerOptions.JIT is set in the compiler. This is apparently caused by the fact that the configuration file specifies optimizationLevel="10". We should interpret that as meaning "set all optimization options supported in this Saxon edition" - requesting an optimization that isn't fully supported in the edition is likely to cause things to break, as has happened here.

Confirmed that removing this line from the configuration file fixes the problem.

Actions #5

Updated by Michael Kay over 1 year ago

  • Status changed from New to In Progress

I've changed Configuration.setFeature(OPTIMIZER_OPTIONS) so it sets the intersection of the requested options and the options supported in the current Configuration.

Actions #6

Updated by Michael Kay over 1 year ago

The above change solves the case where the configuration file requests a PE configuration, but we still get a (different) crash when it requests an EE configuration.

This arises because one of the function library lists for the configuration has a null entry.

It turns out this crash has nothing to do with setting the -TP option, it's a crash while executing the main transformation. It is, however, associated with something in the configuration file.

Actions #7

Updated by Michael Kay over 1 year ago

The crash occurs when the configuration file requests edition="EE" but the license file only allows PE features. Fixed.

Actions #8

Updated by Michael Kay over 1 year ago

Testing this now on the development branch (12.x) I get a different failure:

~~Error on line 28 column 64 of lib.overrides.xsl: XTSE0165 I/O error reported by XML parser processing urn:ovidius:names:edk:lib:lib.core.xsl: unknown protocol: urn: unknown protocol: urn I/O error reported by XML parser processing urn:ovidius:names:edk:lib:lib.core.xsl: unknown protocol: urn ~~

This appears to be because the catalog (specified using -catalog on the command line) isn't being picked up.

False alarm. I miscopied the command line options and failed to include -catalog.

Actions #9

Updated by Michael Kay over 1 year ago

  • Status changed from In Progress to Resolved
  • Applies to branch 11, trunk added
  • Fix Committed on Branch 11, trunk added
  • Platforms .NET, Java added
Actions #10

Updated by Jan Pertermann over 1 year ago

I removed the optimization attribute from saxon config and then was able to time-trace the transformation like you. Following the time profile and your hints I realized the following:

(1) There were some functions which could be easily optimized by me. That was filter:match-applic-maps#2 and filter:normalize-applic-map#1.

(2) But there was still much time consumed in function filter:filter-applic-maps#2 . I tried to fragment the function out to get smaller parts which would (in theory) allow me to isolate the problem more in detail. Surprisingly the sum of the time-consumption of the out-fragmented functions was very small compared to filter:filter-applic-maps#2 which was still expensive. Then I did it the other way around and replaced called functions where element nodes were given as arguments with simpler expressions. That help a lot and reduced the time for the function by factor ~3 (another test-set). Unfortunately it is still too much time.

(3) Even if there was no breakthrough with (2), I thought I should test out the transformation in a non-debug mode and without profiling/time-tracing activated. The transformation was -- as before -- not coming to an end. Heap space and CPU time ran full. I supposed that the optimization in profiling mode may in parts be disabled. Et voilĂ , setting -opt:0 made the transformation run like in profiling mode. I tried to deactivate each optimization to see which one is causing the slowdown and ended with -opt:-t, which is the tail call optimization. How does that come?

(4) Removing new-each-time="true" did not seem to be a solution. The transformation became even more slow. There is a function that creates an element-based map/list and there is a function that normalizes those maps. Both produce elements and are called often. In this case the node IDs play no role for the transformation. The elements are "just data". I'm not confident that this it is a main cause as we had these element-based maps before without problems. What would be an alternative?

Actions #11

Updated by Jan Pertermann over 1 year ago

For the record:

(5) I replaced the element-based maps/lists with map-based ones to see if that brings performance advantages (element(applic-item)* becomes map(*)*), but that doesn't make a significant difference.

(6) I reduced the calls of exf:variant#1 by making simple comparisons with the parent element to check if a different result is to be expected. If not than a result is taken from a tunneled parameter. Even if this is just a workaround, it is effective and reduces transformation time by factor ~5 on top of the reduction described in (2).

Actions #12

Updated by Michael Kay over 1 year ago

The primary problem I investigated was the heap space problem, this was definitely caused by keeping about 300K XML trees in memory to support the caching of function results. Getting rid of this caching isn't going to make things run faster (on the contrary) - caching is a space/time tradeoff and removing the caching will solve the memory problem at the cost of increasing execution time.

I'm pleased you're making progress in eliminating some of the performance bottlenecks.

We're now in territory where there's no evidence of a Saxon bug or design problem, you're essentially asking for performance consultancy to tune your application. I'm afraid we're not resourced to provide that.

You asked about tail call optimization. The primary purpose of tail call optimization is to prevent stack overflow. I would expect it to be roughly neutral in terms of the effect on execution speed. You're right that certain optimizations are suppressed when tracing is in force - monitoring the entry and exit to functions isn't really possible if function calls aren't executed in the normal way.

Do you have data on how the elapsed time of the transformation increases with input data size?

Actions #13

Updated by Community Admin over 1 year ago

  • % Done changed from 0 to 100
  • Fixed in Maintenance Release 12.0 added

Bug issue fix applied in the Saxon 12.0 Major Release. Leaving this bug marked as Resolved until fix applied

Actions #14

Updated by O'Neil Delpratt about 1 year ago

  • Fixed in Maintenance Release 11.5 added

Bug applied in the Saxon 11.5 Maintenance release.

Actions #15

Updated by O'Neil Delpratt about 1 year ago

  • Status changed from Resolved to Closed

Please register to edit this issue

Also available in: Atom PDF