Bug #3483
closedXQueryCompiler is not thread-safe
100%
Description
The message shown in the subject line was thrown with an IllegalStateException, together with some information being written to System.out (or possibly System.err). This happened during XQuery processing, but unfortunately there is no stacktrace available.
It is not reproducible at will, I have seen it in one server session while running load tests. In many other executions of the same code, nothing similar has been observed.
The attached file contains the information that was written to the log.
It was tested with Saxon-EE 9.8.0.2.
Files
Updated by Michael Kay about 7 years ago
This one is going to be tough, I'm afraid. The message indicates that something went wrong earlier, generally during the optimization phase, and subsequently we've found an invalid expression tree. Often it's caused by some expression not cloning itself correctly during an operation such as function inlining or loop lifting.
The specific condition is detected while allocating slot numbers to local variables. We do a recursive walk down the tree, which should always get to a local variable declaration (and allocate it a slot number) before it gets to a local variable reference to that variable: the error is reported when this tree walk finds a local variable reference pointing to a variable declaration that has no slot allocated.
I haven't seen this occur as a threading or load-dependent issue before. The XQuery compile process is single-threaded so there should be no need to protect data structures from concurrent update. Even with a stack trace, we would be struggling to see what's going on, because the error is simply detecting the after-effects of something that went wrong much earlier.
Even if we can't reproduce the failure, it might be useful to get access to the query, so we can see what the optimizer is doing with it, and judge whether anything unusual is going on.
Updated by Gunther Rademacher about 7 years ago
Thank you for looking into this. Eventually I found a way to reproduce it, and it was my fault.
This happens when compile runs concurrently to compileLibrary on the same (shared) XQueryCompiler instance, where the query that is processed by compile imports the module that is being processed by compileLibrary. In my repro, I am launching compileLibrary and compile with a random delay between 0 and 32 msec, and I am seeing various failures of compile, including the one that I had described.
Actually this was a bug in the setup of the shared compiler, which should not have been made available to other threads before having compiled the module.
Thanks again for looking into this. From my point of view, this issue can be closed.
Updated by Gunther Rademacher about 7 years ago
- File Saxon3483.log Saxon3483.log added
- File Saxon3483.java Saxon3483.java added
I have to revert my previous statement - the problem is reproducible even with (as I think) proper synchronization. Also reproduction is possible with 9.8.0.2 and 9.8.0.5, but not with 9.7.0.9.
The attached Saxon3483.java uses two threads, both started within 8 msec, to compile the same query concurrently. Each thread tries to initialize the common XQueryCompiler and compile a module beforehand, then compile a query that imports the module.
For 9.8, this typically starts running into problems after about 300 test cycles, and the "Internal Saxon error" may be observed soon thereafter.
I am also attaching the log of a sample run.
Updated by Michael Kay about 7 years ago
- Subject changed from Internal Saxon error: local variable encountered whose binding has been deleted to XQueryCompiler is not thread-safe
Marking the compileQuery and compileLibrary methods of StaticQueryContext as synchronized
solves the problem; but it would be nice to find a less draconian solution.
Updated by Michael Kay about 7 years ago
Exporing further, it appears to be sufficient to synchronize the line
return qp.makeXQueryExpression(query, mainModule, config);
within StaticQueryContext.compileQuery.
There's a curious line (XQueryParser#194: config.addExtensionBinders(lib) -- I can't see why the XQUeryParser should be updating the Configuration - but removing this line appears to have no impact (the problem still occurs).
Synchronizing the line (XQueryParser#168) exec.fixupQueryModules(mainModule); (on the Configuration) fixes the problem; synchronizing it on the Executable doesn't. (which isn't surprising because each query should be using a different Executable).
Synchronizing main.optimizeGlobalFunctions() (Executable#466) is also sufficient.
Synchronizing this method (on the XQueryFunctionLibrary instance is NOT sufficient.
Synchronizing the call on ExpressionTool.allocateSlots() (XQueryFunction#479) is NOT sufficient.
Synchronizing the entire conditional expression at XQueryFunction#437 (if (opt.getOptimizerOptions().isSet(OptimizerOptions.MISCELLANEOUS)) {...) appears to be sufficient, but I have not found any individual part of this conditional that is sufficient. This might suggest interference between two separate instructions within this block.
In fact it appears to be necessary to synchronize two instructions within this block:
synchronized(config) {
body = body.optimize(visitor, ContextItemStaticInfo.ABSENT);
//body.verifyParentPointers();
}
synchronized(config) {
body = LoopLifter.process(body, visitor, ContextItemStaticInfo.ABSENT);
}
and it's apparently OK to synchronize them separately.
Synchronizing on the ExpressionVisitor or the StaticContext is not sufficient.
Standing back from all this, it looks likely that there are functions in the library module that are being re-optimized during multiple concurrent compilations of queries that import the module library. It's acceptable to re-optimize a copy of such a function (for example, after function inlining) but this should never affect the original.
Updated by Michael Kay about 7 years ago
- Category set to Multithreading
- Status changed from New to Resolved
- Assignee set to Michael Kay
- Applies to branch trunk added
- Fix Committed on Branch 9.8, trunk added
As suspected, functions in the library module are being re-optimized every time a query using the library is compiled, and by preventing this, the problem goes away.
Changed XQueryFunctionLibrary.optimizeGlobalFunctions() to only optimize functions in the same unit of compilation.
Updated by Michael Kay about 7 years ago
I made a further change: the Map of compiled query libraries held by StaticQueryContextEE is now a sychronized map.
Updated by O'Neil Delpratt about 7 years ago
- Status changed from Resolved to Closed
- % Done changed from 0 to 100
- Fixed in Maintenance Release 9.8.0.6 added
Bug fix applied in the Saxon 9.8.0.6 maintenance release.
Updated by Gunther Rademacher about 7 years ago
Thank you for the fix - this now works fine in 9.8.0.6.
Please register to edit this issue