Project

Profile

Help

Bug #4216

XPathEvaluator cross thread usage causes memory increase/leak

Added by Stefan Fritz 4 months ago. Updated 3 months ago.

Status:
Closed
Priority:
High
Assignee:
Category:
Performance
Sprint/Milestone:
-
Start date:
2019-05-09
Due date:
% Done:

0%

Legacy ID:
Applies to branch:
9.8, 9.9
Fix Committed on Branch:
Fixed in Maintenance Release:

Description

We use Saxon EE 9.8.0.15 but I can reproduce the same with 9.9.1.2.

We share an XPathEvaluator across multiple threads but each thread creates its own XPathExpression for local usage. One of our customers ran into an OutOfMemory error. The heap dump analysis shows that the majority of heap is allocated inside net.sf.saxon.trans.KeyManager. (screenshot attached)

I am able to reproduce this using the attached sample.

Can you please check if there is a memory leak or if we misuse the APIs?

2019-05-09 18_49_43-Eclipse Memory Analyzer.png (23.3 KB) Heap OQL Stefan Fritz, 2019-05-09 18:50 2019-05-09 18_49_43-Eclipse Memory Analyzer.png
XPathTest.java (4.95 KB) Stefan Fritz, 2019-05-09 18:54 XPathTest.java
XPathTest.java (5.18 KB) really starts multiple threads now Stefan Fritz, 2019-05-10 08:53 XPathTest.java

History

#1 Updated by Michael Kay 4 months ago

  • Category set to Performance
  • Assignee set to Michael Kay

The optimizer is deciding that an index would be useful for evaluating the expression. (Whether that's actually true or not is irrelevant; it doesn't have much information to go on.)

An index will be shared between multiple evaluations that use equivalent expressions on the same document. In your case each thread is using a different document, and is executing the expression only once, so there is no opportunity for sharing.

The implementation of these indexes uses weak references to ensure that an index associated with a particular (document, expression) pair is eligible for garbage collection if either the document or the expression is garbage collected. This is why you see the memory released if you force a garbage collection. If you don't force a garbage collection, the memory hangs around until the system decides to reclaim it.

(This seems to conflict with your observation of an OutOfMemory error. Perhaps the OOM error is an unrelated problem, or the operational scenario is different in some way?)

The code you have shown is obviously for demonstration purposes, and the best way forward depends on what the real code is doing. For example, perhaps the real code is sharing a document instance between multiple threads? Perhaps the real code is executing more than one expression each time it builds a document tree in memory?

If you really are building documents in memory in order to execute a single expression then your best bet is probably to suppress the building of indexes entirely. You can do that by setting Feature.OPTIMIZATION_LEVEL on the Configuration object to the value "-k".

You could also consider using streaming - but that might work for this demo and not for the real application.

Note also that the expression

/message/headers/header [@name=\"COUNTRY_CODE\"]=\"GB\"
 or /message/headers/header [@name=\"COUNTRY_CODE\"]=\"IE\" 
 or /message/headers/header [@name=\"COUNTRY_CODE\"]=\"IN\" 
 or /message/headers/header [@name=\"COUNTRY_CODE\"]=\"BE\" 
 or /message/headers/header [@name=\"COUNTRY_CODE\"]=\"JP\";

can be usefully simplified to

/message/headers/header [@name=\"COUNTRY_CODE\"] = ("GB", "IE", "IN", "BE", "JP")

(The Saxon-EE optimizer in fact does this rewrite automatically)

#2 Updated by Vladimir Nesterovsky 4 months ago

A guess.

I don't see EE code, but by analogy with HE, if a document is put in a pool then a firm reference will be kept, which leads to a leak.

#3 Updated by Michael Kay 4 months ago

Vladimir, thanks for the suggestion, but I've confirmed there is no document pool involved here. Documents created using Configuration.buildDocumentTree() are not cached anywhere in the system.

In fact, when I run this test, I see very steady memory usage. (The figure reported increases slowly from 502 to 880, then drops back to 472.) I guess a lot depends on the garbage collector settings in the JVM. I'm seeing a very similar pattern on 9.9 and on the development branch.

Setting

xpathEvaluator_.getConfiguration().setConfigurationProperty(Feature.OPTIMIZATION_LEVEL, "-k");

reduces the memory usage a little, but it's not really significant.

Although there is no accumulation of documents,I have established that there an accumulation of key definitions. That's because the JAXP XPathEvaluator contains a single StaticContext object shared by all the expressions, and the StaticContext owns the KeyManager. This is by design, because it means that multiple expressions, if they run against the same document, can use the same indexes. What is unusual about this workload is the one-to-one relationship of XPath expressions to documents, and in that scenario there is no benefit in sharing the XPathEvaluator, or indeed the Configuration itself.

However, I did a test where i moved the creation of the XPathEvaluator into the loop, and the reported memory consumption was not significantly different.

#4 Updated by Michael Kay 4 months ago

I've now done a bit more instrumentation and I'm seeing that the number of KeyDefinitions held by the KeyManager remains constant (at 2). That's because each time an expression is compiled and optimized, a new KeyDefinition is added, but it's recognized as being identical to the one already there so they are combined.

The number of document indexes held by the KeyManager shows a pattern where it rises gradually to about 500, then drops to zero and starts rising again. This confirms that the design is working as intended: we're holding weak references to the indexes allowing them to be reused if needed, but the garbage collector is discarding them when it needs the memory.

#5 Updated by Stefan Fritz 4 months ago

Thanks for the swift and detailed answer. Yes with -k it works.

The real app is quite similar. It is a messaging system that in this area starts n threads. All share the same XPathEvaluator and on each new message arriving the expression is compiled and applied. This caused OOM at the customer.

The one thing though I do not understand is why is the test app running out of memory without -k. Based on your statement above the weak references should be released at some point by the GC.

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at java.util.HashMap.resize(HashMap.java:704) at java.util.HashMap.putVal(HashMap.java:629) at java.util.HashMap.put(HashMap.java:612) at net.sf.saxon.trans.KeyIndex.addEntry(KeyIndex.java:201) at net.sf.saxon.trans.KeyIndex.processNode(KeyIndex.java:190) at net.sf.saxon.trans.KeyIndex.constructIndex(KeyIndex.java:131) at net.sf.saxon.trans.KeyIndex.buildIndex(KeyIndex.java:91) at net.sf.saxon.trans.KeyManager.buildIndex(KeyManager.java:290) at net.sf.saxon.trans.KeyManager.obtainSharedIndex(KeyManager.java:417) at net.sf.saxon.trans.KeyManager.obtainIndex(KeyManager.java:387) at net.sf.saxon.trans.KeyManager.selectByKey(KeyManager.java:334) at net.sf.saxon.functions.KeyFn.search(KeyFn.java:268) at net.sf.saxon.functions.KeyFn.call(KeyFn.java:193) at net.sf.saxon.expr.FunctionCall.iterate(FunctionCall.java:547) at net.sf.saxon.expr.Atomizer.iterate(Atomizer.java:321) at com.saxonica.ee.optim.GeneralComparisonEE.effectiveBooleanValue(GeneralComparisonEE.java:100) at net.sf.saxon.expr.OrExpression.effectiveBooleanValue(OrExpression.java:134) at net.sf.saxon.expr.OrExpression.effectiveBooleanValue(OrExpression.java:134) at net.sf.saxon.expr.OrExpression.effectiveBooleanValue(OrExpression.java:134) at net.sf.saxon.expr.OrExpression.effectiveBooleanValue(OrExpression.java:134) at net.sf.saxon.xpath.XPathExpressionImpl.evaluate(XPathExpressionImpl.java:202) at XPathTest$1.run(XPathTest.java:42) at XPathTest.main(XPathTest.java:56)

btw I just realized that I called run() instead of start... Once I change that the app even crashes.

First I get:

; SystemID: ; Line#: 1; Column#: 2 net.sf.saxon.trans.XPathException: Key 'Q{http://saxon.sf.net/}kk292' has not been defined at net.sf.saxon.functions.KeyFn.getKeyDefinitionSet(KeyFn.java:240) at net.sf.saxon.functions.KeyFn.fixArguments(KeyFn.java:119) at net.sf.saxon.functions.SystemFunction.makeOptimizedFunctionCall(SystemFunction.java:130) at net.sf.saxon.expr.SystemFunctionCall.optimize(SystemFunctionCall.java:241) at net.sf.saxon.expr.FilterExpression.optimize(FilterExpression.java:388) at net.sf.saxon.expr.SlashExpression.optimize(SlashExpression.java:373) at net.sf.saxon.expr.Operand.optimize(Operand.java:200) at net.sf.saxon.expr.Expression.optimizeChildren(Expression.java:635) at net.sf.saxon.expr.UnaryExpression.optimize(UnaryExpression.java:111) at net.sf.saxon.expr.Atomizer.optimize(Atomizer.java:207) at net.sf.saxon.expr.Operand.optimize(Operand.java:200) at net.sf.saxon.expr.Expression.optimizeChildren(Expression.java:635) at net.sf.saxon.expr.GeneralComparison.optimize(GeneralComparison.java:422) at net.sf.saxon.expr.Operand.optimize(Operand.java:200) at net.sf.saxon.expr.Expression.optimizeChildren(Expression.java:635) at net.sf.saxon.expr.BooleanExpression.optimize(BooleanExpression.java:105) at net.sf.saxon.expr.OrExpression.optimize(OrExpression.java:65) at net.sf.saxon.expr.Operand.optimize(Operand.java:200) at net.sf.saxon.expr.Expression.optimizeChildren(Expression.java:635) at net.sf.saxon.expr.BooleanExpression.optimize(BooleanExpression.java:105) at net.sf.saxon.expr.OrExpression.optimize(OrExpression.java:65) at net.sf.saxon.expr.Operand.optimize(Operand.java:200) at net.sf.saxon.expr.Expression.optimizeChildren(Expression.java:635) at net.sf.saxon.expr.BooleanExpression.optimize(BooleanExpression.java:105) at net.sf.saxon.expr.OrExpression.optimize(OrExpression.java:65) at net.sf.saxon.expr.Operand.optimize(Operand.java:200) at net.sf.saxon.expr.Expression.optimizeChildren(Expression.java:635) at net.sf.saxon.expr.BooleanExpression.optimize(BooleanExpression.java:105) at net.sf.saxon.expr.OrExpression.optimize(OrExpression.java:65) at net.sf.saxon.xpath.XPathEvaluator.compile(XPathEvaluator.java:197) at XPathTest$1.run(XPathTest.java:42)

Then also this:

java.util.ConcurrentModificationException at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:909) at java.util.ArrayList$Itr.next(ArrayList.java:859) at net.sf.saxon.type.TypeHierarchy.toSet(TypeHierarchy.java:676) at net.sf.saxon.type.TypeHierarchy.computeRelationship(TypeHierarchy.java:420) at net.sf.saxon.type.TypeHierarchy.relationship(TypeHierarchy.java:326) at net.sf.saxon.type.TypeHierarchy.computeRelationship(TypeHierarchy.java:445) at net.sf.saxon.type.TypeHierarchy.relationship(TypeHierarchy.java:326) at net.sf.saxon.type.TypeHierarchy.isSubType(TypeHierarchy.java:285) at net.sf.saxon.expr.FilterExpression.tryToRewritePositionalFilter(FilterExpression.java:613) at net.sf.saxon.expr.FilterExpression.optimize(FilterExpression.java:438) at net.sf.saxon.expr.Operand.optimize(Operand.java:200) at net.sf.saxon.expr.SlashExpression.optimize(SlashExpression.java:350) at net.sf.saxon.expr.Operand.optimize(Operand.java:200) at net.sf.saxon.expr.Expression.optimizeChildren(Expression.java:635) at net.sf.saxon.expr.UnaryExpression.optimize(UnaryExpression.java:111) at net.sf.saxon.expr.Atomizer.optimize(Atomizer.java:207) at net.sf.saxon.expr.Operand.optimize(Operand.java:200) at net.sf.saxon.expr.Expression.optimizeChildren(Expression.java:635) at net.sf.saxon.expr.GeneralComparison.optimize(GeneralComparison.java:422) at net.sf.saxon.expr.Operand.optimize(Operand.java:200) at net.sf.saxon.expr.Expression.optimizeChildren(Expression.java:635) at net.sf.saxon.expr.BooleanExpression.optimize(BooleanExpression.java:105) at net.sf.saxon.expr.OrExpression.optimize(OrExpression.java:65) at net.sf.saxon.expr.Operand.optimize(Operand.java:200) at net.sf.saxon.expr.Expression.optimizeChildren(Expression.java:635) at net.sf.saxon.expr.BooleanExpression.optimize(BooleanExpression.java:105) at net.sf.saxon.expr.OrExpression.optimize(OrExpression.java:65) at net.sf.saxon.expr.Operand.optimize(Operand.java:200) at net.sf.saxon.expr.Expression.optimizeChildren(Expression.java:635) at net.sf.saxon.expr.BooleanExpression.optimize(BooleanExpression.java:105) at net.sf.saxon.expr.OrExpression.optimize(OrExpression.java:65) at net.sf.saxon.expr.Operand.optimize(Operand.java:200) at net.sf.saxon.expr.Expression.optimizeChildren(Expression.java:635) at net.sf.saxon.expr.BooleanExpression.optimize(BooleanExpression.java:105) at net.sf.saxon.expr.OrExpression.optimize(OrExpression.java:65) at net.sf.saxon.xpath.XPathEvaluator.compile(XPathEvaluator.java:197) at XPathTest$1.run(XPathTest.java:42) java.util.ConcurrentModificationException at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:909) at java.util.ArrayList$Itr.next(ArrayList.java:859) at net.sf.saxon.type.TypeHierarchy.toSet(TypeHierarchy.java:676) at net.sf.saxon.type.TypeHierarchy.computeRelationship(TypeHierarchy.java:420) at net.sf.saxon.type.TypeHierarchy.relationship(TypeHierarchy.java:326) at net.sf.saxon.type.TypeHierarchy.computeRelationship(TypeHierarchy.java:445) at net.sf.saxon.type.TypeHierarchy.relationship(TypeHierarchy.java:326) at net.sf.saxon.type.TypeHierarchy.isSubType(TypeHierarchy.java:285) at net.sf.saxon.expr.FilterExpression.tryToRewritePositionalFilter(FilterExpression.java:613) at net.sf.saxon.expr.FilterExpression.optimize(FilterExpression.java:438) at net.sf.saxon.expr.Operand.optimize(Operand.java:200) at net.sf.saxon.expr.SlashExpression.optimize(SlashExpression.java:350) at net.sf.saxon.expr.Operand.optimize(Operand.java:200) at net.sf.saxon.expr.Expression.optimizeChildren(Expression.java:635) at net.sf.saxon.expr.UnaryExpression.optimize(UnaryExpression.java:111) at net.sf.saxon.expr.Atomizer.optimize(Atomizer.java:207) at net.sf.saxon.expr.Operand.optimize(Operand.java:200) at net.sf.saxon.expr.Expression.optimizeChildren(Expression.java:635) at net.sf.saxon.expr.GeneralComparison.optimize(GeneralComparison.java:422) at net.sf.saxon.expr.Operand.optimize(Operand.java:200) at net.sf.saxon.expr.Expression.optimizeChildren(Expression.java:635) at net.sf.saxon.expr.BooleanExpression.optimize(BooleanExpression.java:105) at net.sf.saxon.expr.OrExpression.optimize(OrExpression.java:65) at net.sf.saxon.expr.Operand.optimize(Operand.java:200) at net.sf.saxon.expr.Expression.optimizeChildren(Expression.java:635) at net.sf.saxon.expr.BooleanExpression.optimize(BooleanExpression.java:105) at net.sf.saxon.expr.OrExpression.optimize(OrExpression.java:65) at net.sf.saxon.expr.Operand.optimize(Operand.java:200) at net.sf.saxon.expr.Expression.optimizeChildren(Expression.java:635) at net.sf.saxon.expr.BooleanExpression.optimize(BooleanExpression.java:105) at net.sf.saxon.expr.OrExpression.optimize(OrExpression.java:65) at net.sf.saxon.expr.Operand.optimize(Operand.java:200) at net.sf.saxon.expr.Expression.optimizeChildren(Expression.java:635) at net.sf.saxon.expr.BooleanExpression.optimize(BooleanExpression.java:105) at net.sf.saxon.expr.OrExpression.optimize(OrExpression.java:65) at net.sf.saxon.xpath.XPathEvaluator.compile(XPathEvaluator.java:197) at XPathTest$1.run(XPathTest.java:42)

Then a bit later it causes a StackOverflowError.

Exception in thread "Thread-6" Exception in thread "Thread-11" java.lang.StackOverflowError at java.util.HashMap$TreeNode.find(HashMap.java:1864) at java.util.HashMap$TreeNode.find(HashMap.java:1874) at java.util.HashMap$TreeNode.find(HashMap.java:1874) at java.util.HashMap$TreeNode.find(HashMap.java:1874) at java.util.HashMap$TreeNode.find(HashMap.java:1874) at java.util.HashMap$TreeNode.find(HashMap.java:1874) at java.util.HashMap$TreeNode.find(HashMap.java:1874) ...

I thought the way this is coded should support multiple parallel threads as the XPathExpression is unique per thread and only XPathEvaluator is shared across threads.

Attached is the modified code to reproduce this.

#6 Updated by Michael Kay 4 months ago

I think it's best if we try to keep separate issues separate so I've raised the multithreading crash as a separate issue at bug #4218

#7 Updated by Michael Kay 4 months ago

Sorry to be going backwards and forwards between multiple issues -- it's confusing me too! The initial ConcurrentModificationException was easily fixed, but closer study reveals that the KeyManager is definitely not thread-safe at compile-time, and this means that the XPathEvaluator (and the s9api XPathCompiler) are not thread-safe either; specifically, it isn't safe to run multiple XPath compilations simultaneously.

I don't know whether this is the cause of the memory problems. It could be. We're writing to a hash table without synchronization, which can corrupt the internal data structure, after which the effects are essentially completely unpredictable.

Until we can fix this I think you should try using a separate XPathEvaluator in each thread -- either that, or compile all the XPath expressions in one thread.

#8 Updated by Michael Kay 4 months ago

Note, we had essentially the same problem a few years ago with XQuery, and solved it by ensuring that the StaticQueryContext held by the XQueryCompiler was read-only during compilation, and that each compilation copied the context information it needed to a QueryModule object, which is mutable during compilation.

#9 Updated by Stefan Fritz 4 months ago

Putting the .compile into a synchronized block works too?

synchronized(xpathEvaluator_) {
    xpathExpression = xpathEvaluator_.compile(expression);
}

Just to be clear with -k on optimization I can leave the code as is. Correct?

#10 Updated by Michael Kay 4 months ago

Yes, running the compile under synchronization should be an adequate workaround.

Having fixed that, it would be interesting to see whether the memory problem still occurs without the "-k", but it certainly won't do any harm.

Using "-kc" to switch bytecode generation off as well would do no harm. In earlier Saxon releases bytecode generation sometimes caused memory problems when the amount of XPath code being compiled was essentially unbounded. I think we fixed that in 9.9 by putting an upper limit on the amount of bytecode, but this kind of workload gets no benefit from bytecode generation so you might as well switch it off.

#11 Updated by Stefan Fritz 4 months ago

Ok then I go simply without synchronize but with -kc

#12 Updated by Michael Kay 3 months ago

  • Status changed from New to Closed

Closing this as resolved. There are no code changes relating directly to this bug, but some tangential problems were identified and fixed under different issue numbers.

Please register to edit this issue

Also available in: Atom PDF