Project

Profile

Help

Support #5575

closed

XPathContextMajor using Saxon-HE

Added by Russell Haley almost 2 years ago. Updated almost 2 years ago.

Status:
Closed
Priority:
High
Assignee:
Category:
-
Sprint/Milestone:
-
Start date:
2022-06-21
Due date:
% Done:

0%

Estimated time:
Legacy ID:
Applies to branch:
10
Fix Committed on Branch:
Fixed in Maintenance Release:
Platforms:

Description

We are seeing a lot of XPathContextMajor in jvisualvm instances. that will eventually consume as much as 5GB of the HEAP.

We are transforming a lot of documents and we are creating a new XsltTransformer by using XsltExecutable.load() which is build with a new XsltCompiler on every transformation.

Actions #1

Updated by Michael Kay almost 2 years ago

Yes, as it happens we're doing some detailed monitoring of object allocation behaviour, and we're also seeing a lot of XPathContextMajor objects allocated. However, I believe most of them should be short-lived, and be quickly garbage collected. It's possible they may be longer-lived if you rely heavily on tail recursion. If you've got a heap dump that shows a very large number of XPathContextMajor instances existing at the same time, it would be interesting to see that, and also to see what the reference paths are that prevent them being garbage collected.

These objects are allocated whenever there is a function call or template call, and in other situations that change the XPath evaluation context. They hold stack frames of parameters and local variables.

In fact, I would expect most of the working data of a transformation to appear within XPathContextMajor objects. Most of this is likely to be values of local variables.

There is also an XPathContextMinor object which is allocated more frequently - whenever the XPath focus changes - but they are smaller and even more short-lived.

Actions #2

Updated by Russell Haley almost 2 years ago

So I am attempting to open up our springboot apps's hprof dump. Very slow 11GB file. When it does open what are the reference paths that we should look out for? What should we see?

Actions #3

Updated by Russell Haley almost 2 years ago

Russell Haley wrote in #note-2:

So I am attempting to open up our springboot apps's hprof dump. Very slow 11GB file. When it does open what are the reference paths that we should look out for? What should we see?

So I got a smaller dump.... I see 3,625,200 each of the following....

XpathContextMajor XPathContextMinor$LastValue FocusTrackingIterator SingletonIterator

Actions #4

Updated by Michael Kay almost 2 years ago

That's certainly excessive (unless you're doing a tail-recursive call with that number of iterations, perhaps). How feasible is it to get us a repro? Can you scale down the source document so the investigation becomes more manageable?

Actions #5

Updated by Russell Haley almost 2 years ago

SO the 11GB dump file has opened. We had 6GB/36,361,433 instances of net.sf.saxon.expr.XPathContextMajor

Actions #6

Updated by Michael Kay almost 2 years ago

Have you tried examining the path to the GC route:

Instances View

The Instance view displays object instances for a selected class. When you select an instance from the Instance pane, Java VisualVM displays the fields of that class and references to that class in the respective panes. In the References pane, you can right-click an item and choose Show Nearest GC Root to display the nearest garbage collection root object.

Actions #7

Updated by Russell Haley almost 2 years ago

Michael Kay wrote in #note-6:

Have you tried examining the path to the GC route:

Instances View

The Instance view displays object instances for a selected class. When you select an instance from the Instance pane, Java VisualVM displays the fields of that class and references to that class in the respective panes. In the References pane, you can right-click an item and choose Show Nearest GC Root to display the nearest garbage collection root object.

Yeah it appear to be some sort of loop. We had two templates that seem to have been calling each other? the references to the XPathContextMajor had this the XPathContextMajor then XPathContextMajor then XPathContextMajor.....

We removed the tempate and all is better. Is there a way to limit recursion

Actions #8

Updated by Michael Kay almost 2 years ago

Saxon does "tail call optimization" which means that when the last instruction in a template is an xsl:apply-templates or xsl:call-template instruction, we close the current stack frame before opening a new one, so you can do deep recursion without running out of Java stack space. However, there is also an XSLT-level stack - a chained list of XPathContextMajor objects - which is held in the Java heap, and tail call optimization doesn't attempt to reduce this (it could, potentially, but that's not the current design). So you can recurse very deep without running out of Java stack, but you will use more and more Java heap.

There are two possibilities here: your recursion could be non-terminating, with memory exhaustion as the symptom; or the recursion could just be very deep, but eventually terminating. In the first case your code is incorrect and needs fixing; in the second case your design is extravagent in resources and needs improving. Sometimes the answer is to adopt a divide-and-conquer strategy rather than a head-tail recursion strategy, but of course that's only possible for some problems. To advise further we need to understand what you are trying to achieve and how your code is written.

Actions #9

Updated by Russell Haley almost 2 years ago

Thanks for the quick interaction.

Our system is on a closed network but here's a brief intro to why we do what we do. We extract the XML from DOCX and transform this to an internal XML format used by our system. This is done outside of the process that was memory leaking. Later down the process this XML is used as the original document to create all other formats. This springboot service that had the memory leak has this responsibility. A lot of the XML is in the JAR of the service but we allow an overide.xml file to be used for both HTML and PDF transformations.

What we noticed last night was that the HTML transformation that used this overide.xsl never finished and would eventually lock up all other transformations because of the used up resources.

The XML I mentioned above sometimes would have

<Para>Some text<Emphasizedext style="b">some bolded text</EmphasizedText>rest of the text.</Para>

Notice the space is missing between the "text" and "some"

The original compiled stylesheets have:

<xsl:template name="restore-lost-space">
    <xsl:choose>
        <xsl:when test="count(following-sibling::node()) &gt; 0 and matches(following-sibling::node()[1], '[.,!?].*')"><!--do nothing--></xsl:when>
        <xsl:when test="count(following-sibling::node()) &gt; 0">
            <xsl:value-of select="' '"/>
        </xsl:when>
    </xsl:choose>
</xsl:template>

<xsl:template match="text()" mode="#all">
    <xsl:choose>
        <xsl:when test=".=''"><!--do nothing--></xsl:when>
        <xsl:when test="string-length(.)=0"><!--do nothing--></xsl:when>
        <xsl:when test="name(following-sibling::node()[1])='Superscript'">
            <xsl:value-of select="."/>
        </xsl:when>
        <xsl:otherwise>
            <xsl:value-of select="."/>
            <xsl:call-template name="restore-lost-space"/>
        </xsl:otherwise>
    </xsl:choose>
</xsl:template>

These exact templates were also placed in the overide.xsl

We we took these out of the overide.xsl the system no longer ran away into LALA land.

Michael Kay wrote in #note-8:

Saxon does "tail call optimization" which means that when the last instruction in a template is an xsl:apply-templates or xsl:call-template instruction, we close the current stack frame before opening a new one, so you can do deep recursion without running out of Java stack space. However, there is also an XSLT-level stack - a chained list of XPathContextMajor objects - which is held in the Java heap, and tail call optimization doesn't attempt to reduce this (it could, potentially, but that's not the current design). So you can recurse very deep without running out of Java stack, but you will use more and more Java heap.

There are two possibilities here: your recursion could be non-terminating, with memory exhaustion as the symptom; or the recursion could just be very deep, but eventually terminating. In the first case your code is incorrect and needs fixing; in the second case your design is extravagent in resources and needs improving. Sometimes the answer is to adopt a divide-and-conquer strategy rather than a head-tail recursion strategy, but of course that's only possible for some problems. To advise further we need to understand what you are trying to achieve and how your code is written.

Actions #10

Updated by Michael Kay almost 2 years ago

Well, that's a little bit strange, but difficult to diagnose in isolation.

In your match="text()' template, the first two conditions are equivalent and pointless. A text node (unless it is parentless, which is a weird edge case) is never zero length, so .="" and string-length(.)=0 are both always false.

But apart from that, I have trouble seeing how the presence of these two templates could cause runaway recursion.

Actions #11

Updated by Vladimir Nesterovsky almost 2 years ago

Is there a chance that

<xsl:when test="count(following-sibling::node()) &gt; 0">

is badly rewritten by the engine.

XPathContextMinor$LastValue hints on this.

Can you try:

<xsl:when test="following-sibling::node()">
Actions #12

Updated by Michael Kay almost 2 years ago

Seems unlikely, Vladimir, but can't rule it out.

I think my next step for debugging this would be to run it with the -T option (perhaps combined with -Tlevel:low). It will generate a vast amount of output, but will probably give a clue as to what's going on.

Actions #13

Updated by Michael Kay almost 2 years ago

  • Tracker changed from Bug to Support
  • Status changed from New to Closed
  • Assignee set to Michael Kay

I'm going to close this because the product is working as designed.

However, we have discovered improvements we can make to the design to reduce the memory footprint of XPathContextMajor objects, and these will appear in the next major release.

Please register to edit this issue

Also available in: Atom PDF