Project

Profile

Help

Bug #5607

closed

Saxon-HE: Memory leak after migrating from version 10.X to version 11.X

Added by Daniel Lopez almost 2 years ago. Updated over 1 year ago.

Status:
Closed
Priority:
Low
Assignee:
Category:
Performance
Sprint/Milestone:
-
Start date:
2022-07-15
Due date:
% Done:

100%

Estimated time:
Legacy ID:
Applies to branch:
11, trunk
Fix Committed on Branch:
11, trunk
Fixed in Maintenance Release:
Platforms:

Description

Hi there, I recently upgraded an application from version 10.5 to version 11.3 of Saxon-HE and after the upgrade I'm seeing quite clearly a memory leak, so I'm wondering is there has been any change on the way templates should be handled with version 11.X, so I'm doing something wrong in my code now, or if if the bug is in Saxon-HE.

For context, I can reproduce the leak in my application 100% of the times, by simply swapping the Saxon-HE jar from one version to the other. No other change between tests. I have tested different versions from the 10.X branch and the 11.X and, as far as I have tested, no version 10.X suffers the issue and all versions 11.X do.

I have attached VisualVM snapshots (sorry, not sure if I can link them from the message) of some testing I have done with version 10.8 and version 11.3. The test requests some pages that are created by generating XML and then processing it with Saxon-HE. The loading and processing of the XSLTs is done by the Spring framework, so I'm not doing anything fancy with the templates.

Version 10.8 can run the whole test, about 15m , and if I then hit "force GC" memory goes back to normal levels. Version 11.3 explodes with an OOM before the 2m mark a Heap Dump before that showed me that there is a ton of memory being consumed by net.sf.saxon.expr.* classes, especially net.sf.saxon.expr.Operand.

Digging further, what I see is that I have a handful, 18, of compiled stylesheets that keep being reused and with version 11.3, it seems new Operands are being added and not cleaned. The GC path for Operand instances look like this one:

net.sf.saxon.expr.Operand#1 lhs in net.sf.saxon.expr.SlashExpression#1 body in net.sf.saxon.expr.instruct.GlobalVariable#1 [85] in java.lang.Object[]#330455 elementData in java.util.ArrayList#452544 globalVariables in net.sf.saxon.style.StylesheetPackage#61 topLevelPackage in net.sf.saxon.PreparedStylesheet#21 preparedStylesheet in net.sf.saxon.s9api.XsltExecutable#20 executable in net.sf.saxon.jaxp.TemplatesImpl#20 ... // Here it references one of the 18 Template instances that the application keeps

I have also done incremental heap dumps with some tests and the objects that grow from one dump to the next and never go down are mostly net.sf.saxon.expr.Operand instances.

So, any idea of some change that could be causing that? Any special handling that have to be performed now in the Templates objects to clean them?

Cheers! D.


Files

Version_10.8.png (74.1 KB) Version_10.8.png Daniel Lopez, 2022-07-15 13:01
Version_11.3.png (115 KB) Version_11.3.png Daniel Lopez, 2022-07-15 13:01
Version_11.3_HeapDump.png (121 KB) Version_11.3_HeapDump.png Daniel Lopez, 2022-07-15 13:01
LeakTest-202207181355-heapdump.hprof (876 MB) LeakTest-202207181355-heapdump.hprof Daniel Lopez, 2022-07-18 14:01
SaxonTest.zip (100 KB) SaxonTest.zip Daniel Lopez, 2022-07-18 18:04
SaxonTest_fixed.zip (99.3 KB) SaxonTest_fixed.zip Daniel Lopez, 2022-07-19 08:12
Actions #1

Updated by Daniel Lopez almost 2 years ago

Hi there, I recently upgraded an application from version 10.5 to version 11.3 of Saxon-HE and after the upgrade I'm seeing quite clearly a memory leak, so I'm wondering is there has been any change on the way templates should be handled with version 11.X, so I'm doing something wrong in my code now, or if if the bug is in Saxon-HE.

For context, I can reproduce the leak in my application 100% of the times, by simply swapping the Saxon-HE jar from one version to the other. No other change between tests. I have tested different versions from the 10.X branch and the 11.X and, as far as I have tested, no version 10.X suffers the issue and all versions 11.X do.

I have attached VisualVM snapshots (sorry, not sure if I can link them from the message) of some testing I have done with version 10.8 and version 11.3. The test requests some pages that are created by generating XML and then processing it with Saxon-HE. The loading and processing of the XSLTs is done by the Spring framework, so I'm not doing anything fancy with the templates.

Version 10.8 can run the whole test, about 15m , and if I then hit "force GC" memory goes back to normal levels. Version 11.3 explodes with an OOM before the 2m mark a Heap Dump before that showed me that there is a ton of memory being consumed by net.sf.saxon.expr.* classes, especially net.sf.saxon.expr.Operand.

Digging further, what I see is that I have a handful, 18, of compiled stylesheets that keep being reused and with version 11.3, it seems new Operands are being added and not cleaned. The GC path for Operand instances look like this one:

net.sf.saxon.expr.Operand#1
lhs in net.sf.saxon.expr.SlashExpression#1
body in net.sf.saxon.expr.instruct.GlobalVariable#1
[85] in java.lang.Object[]#330455
elementData in java.util.ArrayList#452544
globalVariables in net.sf.saxon.style.StylesheetPackage#61
topLevelPackage in net.sf.saxon.PreparedStylesheet#21
preparedStylesheet in net.sf.saxon.s9api.XsltExecutable#20
executable in net.sf.saxon.jaxp.TemplatesImpl#20
... // Here it references one of the 18 Template instances that the application keeps

I have also done incremental heap dumps with some tests and the objects that grow from one dump to the next and never go down are mostly net.sf.saxon.expr.Operand instances.

So, any idea of some change that could be causing that? Any special handling that have to be performed now in the Templates objects to clean them?

Cheers! D.

Actions #2

Updated by Daniel Lopez almost 2 years ago

Ok, I see modify does not do what I expected it to do :)

Actions #3

Updated by Michael Kay almost 2 years ago

If you can send us a repro (stylesheet and source document) that exhibit this behaviour we'll be happy to take a look at it.

We've been doing quite a bit of work recently studying object allocation behaviour, and we didn't see this problem, so it's probably specific to your XSLT code. We've also been developing some tooling that makes it easier to study such behaviour and this will make a good test case for the tools.

The sample GC path you provided shows nothing that we wouldn't expect to see. The compiled stylesheet (XsltExecutable) here has references to 85 global variable; in the one shown the body (the select expression or contained sequence constructor) is a path expression (SlashExpression) with two operands, lhs and rhs, each of which may have further operands depending on the complexity of the expression.

We wouldn't expect Operand objects ever to be created during stylesheet execution (except in the case where you're using JIT template compilation, in which case they will be constructed during the first execution of each component). Another exception would be use of xsl:evaluate or fn:transform.

Actions #4

Updated by Daniel Lopez almost 2 years ago

I'll see next week if I can narrow down the test to some specific stylesheet with a given source.

I don't remember using xsl:evaluate or fn:transform explicitly in my stylesheets, but I'll check as well to verify if I'm using them somewhere or not. I'll also dig in the internals of Spring to see how they are loading the template, to see if it can explain the exploding number of Operand objects.

Thanks.

Actions #5

Updated by Michael Kay almost 2 years ago

  • Status changed from New to AwaitingInfo
Actions #6

Updated by Daniel Lopez almost 2 years ago

Hi there, Making some progress, I'm reducing the test step by step because it's not a simple thing. I created a basic XML/XSLT processing class reusing the same inputs and doing what I thought was the same XSLT management and it displayed no leak. So I've tried the other way and started reducing the application that displays the leak to the minimum while checking that the leak is maintained.

So far, I've determined that it seems that it has to do with the XSLT compilation, rather than the actual processing because processing an almost empty XML node that produces no result still produces the leak.

It seems to be something related to the way Spring handles the XSLT Views , because so far, everything I've tried to outside of that exhibits no leak. Still trying...

Actions #7

Updated by Michael Kay almost 2 years ago

Thanks. I have seen other reports of problems specific to Spring. Unfortunately we have no Spring expertise in Saxonica, but we'll do what we can to assist. If you can get us a heap dump, that might help us narrow the problem down.

Actions #8

Updated by Daniel Lopez almost 2 years ago

Getting closer, this one is quite intriguing :D. I'm attaching a heapdump in case you can see something faster, but as far as I have been able to determine, it seemes to be related to the way the TransformerFactory instance is used. If I create a TransformerFactory instance myself using TransformerFactory.newInstance() -> no leak. If I use the TransformerFactory instance that the Spring class org.springframework.web.servlet.view.xslt.XsltView has stored, and supposedly has created the same way, then I see a memory leak. So there's something there I've still yet to find.

Tomorrow, some more. Cheers!

Actions #9

Updated by Michael Kay almost 2 years ago

Thanks. First observations:

There are 9 TransformerFactoryImpl instances and 9 Configuration instances in the heap, which is not itself a problem, but is slightly curious - normally I would only expect one.

But there is only 1 PreparedStylesheet instance (representing a compiled stylesheet). Normally I would expect an Operand object to be reachable only via the compiled stylesheet that it belongs to.

I also see 800 instances of ReceivingContentHandler. That's certainly an issue, because this class should be transient, used only while building a source document or stylesheet from a SAX parser. But there are 800 SAXParserImpl instances in the heap, so somehow the XML parser instances are accumulating.

We've also got 800 instances of LinkedTreeBuilder, and 1000 instances of DocumentImpl, corresponding to 800 instances of XSLStylesheet (and StylesheetModule) and 200 instances of XSLPackage. The numbers are all strangely regular: 200 instances of XSLOutput, 400 instances of XSLImport, 8600 instances of XSLTemplate.

The route from the GC root to all these objects (which keeps them from being garbage collected) seems to derive from org.springframework.web.servlet.view.xslt.XsltViewResolver, which is clearly Spring getting involved somehow in managing XSLT resources. It's possible that Spring may be using the JAXP TransformerFactory mechanism in a way that makes sense for Xalan, but doesn't make sense for Saxon. As mentioned, I've no particular knowledge of Spring, so I really don't know what's going on here.

I would speculate that (a) Spring's XsltViewResolver is caching compiled stylesheets (Templates objects), (b) the cache size is 1000, and (c) for some reason when a stylesheet is used more than once, a new copy is getting loaded because the existing copy isn't recognized. However, this is fairly wild guesswork.

Actions #10

Updated by Michael Kay almost 2 years ago

Been doing some further digging around the heap dump with the help of the Spring source code.

There's one Saxon net.sf.saxon.jaxp.Templates object referenced from a util.SummerXSLTView which is a Spring class; both of these only appear to have one instance. So the Templates object itself is being garbage collected; the problem some of the objects within the Templates must be linked from elsewhere. The Templates is a wrapper around a net.sf.saxon.s9api.XsltExecutable which also has only one instance. This refers to a net.sf.saxon.PreparedStylesheet which only has one instance. This refers to a net.sf.saxon.style.StylesheetPackage which has 200 instances - so this is where the rot sets in. (Presumably each stylesheet package contains 5 modules, hence 1000 stylesheet modules).

Looking at the incoming references to StylesheetPackage most of them are clearly "owner pointers" from internal objects.

Ah: a possible anomaly. Looking at some of the internal objects, some of them have a very long path back to the GC root.

Following part of this chain backwards,

XSLInclude 0x71789f attributeMap
SingletonAttributeMap 0x7178a2 location
RCH$LocalLocator 0x661abe
AbstractSAXParser$LocatorProxy 0x661abf
JAXPSAXParser 0x66213c fLexicalHandler
RCH 9x723f7c receiver
CommentStripper 0x723f81 nextReceiver
Stripper 0x723f82
UseWhenFilter 0x723f83 compilation
Compilation 0x723f89 principalStylesheetModule
PSM 0x723f91 rootElement
XSLPackage 0x723f92 _children
NodeImpl[] 0xb0ad21
XSLTemplate 0x723f92
SingletonAttributeMap 0x5ddc32
RCH$LocalLocator 0x726330
AbstractSAXParser$LocatorProxy 0x726331
JAXPSAXParser 0x6615f5
RCH 0x736a09

and so ad infinitum. Essentially, a PrincipalStylesheetModule, contains Location references via the XML parser's ContentLocator which reference the registered ContentHandler of the parser and this links to a Compilation of (presumably an earlier) PrincipalStylesheetModule.

My guess is that a SAX parser is being reused without being totally reinitialised.

It's a bit difficult to get further than this without seeing the whole thing running in the debugger. It's not clear without understanding Spring exactly what Saxon interfaces are being used to compile the stylesheet, instantiate a SAX parser etc. We might be able to make progress by resetting callbacks more carefully before/after each parse but we won't be able to test that any change works without a repro.

Actions #11

Updated by Michael Kay almost 2 years ago

I think the issue is that when we create an AttributeMap, we aren't making a copy of the parsers Location information.

For elements, TinyBuilder and LinkedTreeBuilder extract the module, line, and column information from the Locator supplied by the SAX parser. In creating the AttributeMap, however, we store a reference to the parser's Locator object directly. This is bad form anyway because the object is mutable; but it also somehow seems to create a reference chain to a previous stylesheet compilation. We don't really need to understand exactly how this reference chain gets formed (which involves understanding Spring and Xerces internals), we just need to stop doing it.

NOTE: none of this explains why the stylesheet is being recompiled repeatedly rather than being retrieved from Spring's cache; but that's a separate problem.

Actions #12

Updated by Daniel Lopez almost 2 years ago

To add more details, I have reduced the basic test operation to something like this:

public class SummerXSLTView extends XsltView {

  private MediaType mediaType;
  private TransformerFactory memberTransformerFactory = TransformerFactory.newInstance();

  @Override
  protected void renderMergedOutputModel(Map<String, Object> model,
      HttpServletRequest request,
      HttpServletResponse response) throws Exception {
    StreamSource stylesheetSource = new StreamSource(TestAPI.class.getResourceAsStream("/xslt/transparencia/pla/index2.xslt"));

    TransformerFactory transformerFactory = getTransformerFactory(); // LEAK
    // NO LEAK TransformerFactory transformerFactory = memberTransformerFactory;
    Transformer transformer = transformerFactory.newTemplates(stylesheetSource).newTransformer();

    Source source = null;
    try {
      source = new StreamSource(TestAPI.class.getResourceAsStream("/grau/GAD2.xml"));
      transformer.transform(source, createResult(response));
    } finally {
      if (source instanceof StreamSource streamSource) {
        if (streamSource.getReader() != null) {
          try {
            streamSource.getReader()
                .close();
          } catch (IOException ex) {
            log.error("Error closing stream source", ex);
          }
        }
        if (streamSource.getInputStream() != null) {
          try {
            streamSource.getInputStream().close();
          } catch (IOException ex) {
            log.error("Error closing input stream", ex);
          }
        }
      }
    }
  }
}

Changing the comments in the line obtaining the TransformerFactory make the leak appear/disappear. Changing the library version without changing the code can also make it appear/disappear.

The stylesheet in this case is being recompiled repeatedly because I was testing the "development" mode where the xslt could be modified from one request to the next, so it is recompiled each time. That's why the leak did not surface in production and stayed under the radar for some time for us, auch :).

Actions #13

Updated by Daniel Lopez almost 2 years ago

A bit more digging and I have found another factor. Spring internal class XsltView has a Templates object (cachedTemplates) that is instantiated when the View is initialised (just once). But I'm overriding its usage, as seen in the code above, and creating a Templates object each time.

Here comes the "interesting" bit, if I skip the cachedTemplates initialisation, no leak happens. But the "funny" thing is that I'm not using it, so I'm puzzled how that makes it happen.

Actions #14

Updated by Daniel Lopez almost 2 years ago

I was just able to reproduce it in a simple Java app, cleaning and packaging. Having a Templates object stored somewhere and then recompiling it for each processing seems to be the issue.

Actions #15

Updated by Daniel Lopez almost 2 years ago

Attached a sample Maven project that when run, exhibits that behaviour. The issue seems to be having the Templates objects stored and also recompiled each time.

I added a 10s pause when run until it starts really processing so one has time to attach VisualVM, for example, to it. Let me know if you have trouble running it.

I have to go, so I'll probably see the answer much later. Cheers!

Actions #16

Updated by Michael Kay almost 2 years ago

Despite not being able to test the fix, I'm reasonably confident in it, but I'll now try and run your repro.

The fix is in ReceivingContentHandler.makeAttributeMap(), we call location.saveLocation() to create a copy of the location information and thereby drop the link to the parser's internal data.

Actions #17

Updated by Michael Kay almost 2 years ago

Using your code, I haven't been able to reproduce the problem: I've only got one StylesheetPackage object on the heap.

However, I'm fairly confident that the patch should fix it.

Actions #18

Updated by Michael Kay almost 2 years ago

  • Category set to Performance
  • Status changed from AwaitingInfo to Resolved
  • Assignee set to Michael Kay
  • Applies to branch trunk added
  • Fix Committed on Branch 11, trunk added
Actions #19

Updated by Daniel Lopez almost 2 years ago

Yeah, for some reason Eclipse let me run the class but to run it independently one had to add the xercesImpl dependency, else one would get an error:

javax.xml.parsers.FactoryConfigurationError: Provider org.apache.xerces.jaxp.DocumentBuilderFactoryImpl not found

I've added, just for completeness, a fixed version that should run fine with (Maven 3.8.6, Java 17)

mvn compile exec:java -Dexec.mainClass="es.uib.owu.test.TestClass"

In any case, happy to hear you nailed it down. Thanks!

Actions #20

Updated by Debbie Lockett over 1 year ago

  • Status changed from Resolved to Closed
  • % Done changed from 0 to 100
  • Fixed in Maintenance Release 11.4 added

Bug fix applied in the Saxon 11.4 maintenance release.

Please register to edit this issue

Also available in: Atom PDF