Increased memory use in Saxon 9.6 compared to Saxon 9.3
We have recently upgraded our tool from Saxon 9.3 to Saxon 9.6. Although functionally it is behaving the same we have noticed some increased memory requirements, of the order of 5-10%.
We haven't yet been able to pin down the source precisely but initial profiling does suggest that there are a number of cases where previously we only had a single instance of a class and now have many thousand. In particular, this seems to have affected extension functions and other classes in the package com.saxonica.functions.extfn. We have also observed large increases in the number of instances of DynamicFunctionCallDefinition, DateTimeValue, IsWholeNumber and LocalVariableReference (in each case from 1 or a small handful to many thousand).
Do any of these symptoms indicate any particular cause and is there some way in which we might be using the Saxon API that could trigger it? We do create our own extension of the function library which delegates to the existing one (in order to implement some specific function-handling behaviour), and set it on the static context using XPathCompiler.getUnderlyingStaticContext().setFunctionLibrary(). This API is not guaranteed to be stable between releases so perhaps we should be using a different approach.
We are currently preparing for a release of our product so any initial observations as soon as possible would be appreciated. We're still investigating the underlying problem and will update this issue with any more detailed information.
#2 Updated by Stuart Barker over 4 years ago
We have looked further into the increase in memory used by the schema grammar model and it seems to be connected to the move away from the use of name pools for QNames; in particular the fact that QNames are handled as char arrays, as opposed to integer pointers. In our case, we load the schema grammars into a DOM and deduplicate the strings (using String.intern() or similar) hence concatenating the parts of the QName into a char array actually increases memory usage.
Although the impact on an individual schema grammar is likely to be relatively small, if there are many such models in memory at once then the cumulative effect may be significant. For example, in our case we have a web service that is pre-configured with about 20 such models; each one is a few 10s of megabytes bigger but the cumulative effect is to use an additional 0.5Gb.
#4 Updated by Michael Kay over 4 years ago
Thanks for reporting it. There seem to be two separate issues here, one with the size of compiled stylesheets and one with the size of compiled schemas. My experience with such things is that the devil is usually in the detail so it would be useful to have specimen programs that illustrate the problem that we can investigate - though I appreciate this can be difficult to achieve for example with a stylesheet that has extension function calls into a complex library. Memory usage isn't something we measure routinely in regression tests so it's entirely possible that some slippage has crept in between releases. Reducing locking contention on the NamePool has been a significant theme of recent releases and it's not at all impossible that the effort to reduce the use of shared memory has caused some increase in use of unshared memory; but it's also very likely that we can fix the problem if we can pin it down.
#5 Updated by Michael Kay over 4 years ago
Without seeing your code, my guess would be that the increase in the number of instances of LocalVariableReference might be due to function inlining. There's no obvious change between 9.3 and 9.6 that would cause function inlining to happen more often than in 9.3, but as I say, the devil is in the detail and only looking at how the optimizer tackles your specific code is likely to show it up. Function inlining of course involves a space/time trade-off and we tend to assume that the primary objective is to reduce time rather than space.
I think the replication of classes associated with extension functions could well relate to the way you are manipulating the function library in the XPath static context: again, we would need to monitor execution of your code to get an understanding of this.
#6 Updated by Michael Kay over 4 years ago
- Status changed from New to In Progress
I have tried compiling a fairly sizeable schema (gmd.xsd) under 9.3 and 9.7 and the results are:
9.3 1070ms 18.79Mb
9.7 809ms 36.97Mb
So this is in line with what you are observing. I'll see what the heap dump tells me.
Looking at the 9.3 and 9.7 heap dumps, I think the difference is not the space occupied by the StructuredQName objects per se, but rather the fact that cross-references between components in the schema model now generally use a HashMap (with StructuredQName as the key) rather than an IntHashMap. Changing that isn't simple.
As an experiment I tried changing the allocation of HashMap objects in PreparedSchema to use default initial size. This made no difference at all to the total space usage.
Another experiment: a significant number of the HashMaps appeared to be in the two properties of UserComplexType: contextDeterminedTypesForElements and contextDeterminedTypesForAttributes. I tried replacing these two HashMaps with a trivial map implementation using a list of keys and a list of values. The effect on the bottom line wasn't dramatic: a reduction from 17.4Mb to 17.2Mb; but it confirms that looking at the way these relationships between schema components are implemented is worth examining.
#7 Updated by Michael Kay over 4 years ago
Continuing the same exploration, I tried changing two HashSets in ElementDecl (substitutionGroupMembers and complexTypesUsingThisElement) to simple lists. A slightly better improvement here: down to 16.7Mb. We've now cut the number of HashMaps down from 5400 to 2070; but the number of nodes in these hashmaps is only down from 51K to 39K; we've basically been knocking out the small hashmaps but it's the nodes in the larger hashmaps that are taking the space.
A significant number of the remaining hashMaps are in the finiteStateMachine.alphabet property, and this is unchanged from 9.3. (Most of these maps are likely to be very small, though...)
#8 Updated by Michael Kay over 4 years ago
Off at a tangent: I can't find any code that uses the ElementDecl.substitutionGroupMembers() as a set, all uses seem to simply iterate over it so it might as well be a list, except possibly for eliminating duplicates during construction. However, this hasn't changed from 9.3, so it doesn't contribute to any regression since 9.3.
#9 Updated by Michael Kay over 4 years ago
Why has the number of java.lang.String instances increased from 10,223 to 22,801? (And with it, the number of char instances from 10,030 to 29,270?)
A large number of the original 10,223 String instances are the values of attributes held in String arrays within AttributeCollectionImpl objects. The number of AttributeCollectionImpl objects is unchanged. Many of the extra strings appear to be held in NoNamespaceName objects which are themselves held in the NodeName array of an AttributeCollectionImpl, representing the names of attributes in the schema document. However, both the NoNamespaceName objects and the underlying strings are shared by all like-named attributes within a single document, so it seems surprising that this should account for so many Strings. The number of NoNamespaceName objects is only 657 so this isn't enough to account for all the extra Strings.
Olson-related timezone names seem to feature quite a lot, e.g "Azerbaijan Summer Time".
The generatedId properties of ElementDecl objects feature strongly. The targetNamespace property also features surprisingly often; one would expect the same String object to be reused here.
Beyond that, we seem to have an awful lot of Strings referring to internal stuff, e.g. class names within the Saxon code or within the JDK.
We haven't accounted for the difference. Perhaps I need to learn how to do OQL queries in JVisualVM.
#10 Updated by Michael Kay over 4 years ago
I'm exploring another angle. In principle, the LinkedTree instances representing the parsed schema documents should be eligible for garbage collection once the schema component model has been built. Ensuring that this is actually the case should give a significant boost in memory usage. I'm analyzing the paths from the heap root to the XSDSchema object representing the document element of a schema document using jhat, which seems better at this kind of analysis than JVisualVM. (Took the heap dump using jmap; couldn't work out how to save a heap dump taken using JVisualVM for analysis by jhat).
It seems that the ElementDecl schema component has a link back to the LinkedTree XSDElement node via its namespaceResolver property. The AttributeDecl and AttributeUse components also hold namespaceResolvers with the same problem
We can solve this easily by copying the namespace context when necessary: we have the mechanism to do this with the SavedNamespaceContext class, designed for the analogous situation with XSLT processing. The Facet class is already doing this.
After making these changes, the size of a heap dump comes down substantially, from 17.5Mb to 10.1Mb, which is less than the Saxon 9.3 figure. Although this doesn't address the changes that caused the regression, it effectively solves the problem by finding compensating improvements elsewhere.
As a beneficial side-effect, I found there was code in the path for reloading a schema component model from an SCM file that appears to handle the namespace context incorrectly, and the change should fix this.
Since the namespace context is very often the same for all elements in a schema document, it would be useful to share the copies of the namespace context in the same way as we do for compiled stylesheets. I will look into that.
#11 Updated by Michael Kay over 4 years ago
- Tracker changed from Support to Bug
- Status changed from In Progress to AwaitingInfo
- Applies to branch 9.7, 9.8 added
- Applies to branch deleted (
- Fix Committed on Branch 9.7, 9.8 added
I have committed a patch on the 9.7 and 9.8 branches whose effect is:
(a) Create NamespaceResolver objects in the schema component model that do not cause the SCM to contain references to the LinkedTree representation of the source schema documents, thus enabling the latter to be garbage-collected; and reuse these NamespaceResolver objects where possible.
(b) Replace a couple of HashSets used by ElementDecl (for example, for substitution group membership) by ArrayLists, growing to HashSets only if the list becomes large.
I'm not going to retrofit these to the 9.6 branch, in the interests of stability.
I shall change the status to AwaitingInfo as we need more information to address the XSLT side of the issue.
#12 Updated by Stuart Barker over 4 years ago
Thank you for the updates.
Note that we don't use any XSLT in our processor. It looks like you have found a similar pattern of memory use in the schema models as we did so, as far as we are concerned, no further action is required. I suspect some of the observations in my original report are worth us looking at more closely but don't seem to have been the cause of the increased memory use.
We will need to upgrade in order to incorporate your changes and wish to release our processor quite soon. This raises a few issues for us:
9.6 is currently highlighted as the most stable and reliable release. Do you have a timetable for when 9.7 will be regarded as stable? What are the likely implications of using a non-stable release?
When is there likely to be a maintenance release on the 9.7 branch incorporating these changes?
Would you consider making these changes on the 9.6 branch? This would allow us to obtain the memory optimizations without having to adapt our processor to the changes in the API that occurred between 9.6 and 9.7. Note also that bug 2736 (reported today), which appears to be a regression between 9.6 and 9.7, will block our ability to upgrade.
#13 Updated by Michael Kay over 4 years ago
- Status changed from AwaitingInfo to Resolved
Part of the process of managing maintenance releases is that once we decide we are doing maintenance with stability as the primary objective, we assess each patch to compare the severity of the problem with the risk of the code change needed to fix it. Patches whose sole purpose is performance improvement will very rarely make the cut, unless the code changes are very localized and the problem is a real stopper. So I don't see this one making it into the 9.6 branch. We're getting close to the point where we apply similar standards to 9.7 patches, but we're not quite there yet; in fact there's a fairly substantial performance patch I've developed for bug 2707 and we're still weighing up the risks/benefits of shipping that on the 9.7 branch.
As a general rule, my advice would be to use the latest release if you need the features (or performance improvements etc) where it brings advantages; use the most stable release if you don't. That's still 9.6 for now.
I will close this now as resolved.
Please register to edit this issue