Bug #2644
closed9.7 regression: Deadlock for concurrent instantiation of Configuration and EnterpriseConfiguration
100%
Description
The chance of dealock, that was reported in Bug #2327, and fixed for 9.6.0.5, has returned with Saxon 9.7.
The test program attached to Bug #2327 can be used to reproduce it. The workaround that is proposed there still applies.
I am also attaching a TestNG test for validation of both the workaround and the fix.
Tested on
-
Saxon-EE 9.7.0.3J
-
Java version 1.8.0_51
Files
Updated by Gunther Rademacher over 8 years ago
Now attaching the unit test that I had already mentioned.
It looks like I was wrong with my statement that the same workaround still applies. While originally class.forName on Configuration was sufficient, working around now requires instantiating Configuration, before EnterpriseConfiguration.
Updated by Michael Kay over 8 years ago
I find it hard to see why the code in 9.6 (apparently) works and that in 9.7 doesn't.
Investigating it, though, I have found a rather better explanation of what's happening here: https://www.farside.org.uk/201510/deadlocks_in_java_class_initialisation
It seems a wonder that it doesn't happen more often, since the examples in that post seem to be very common coding patterns.
Updated by Michael Kay over 8 years ago
- Category set to Build and release
- Status changed from New to In Progress
- Assignee set to Michael Kay
- Priority changed from Low to Normal
- Applies to branch 9.7 added
- Fix Committed on Branch 9.7 added
Investigating this has taken me up some strange avenues.
For example I tried tracing static class initialization by adding blocks such as
static {
System.out.println("\nKILROY Initializing Configuration in thread " + Thread.currentThread());
}
to key classes. This led to the failure:
No enum constant nativetests.ConfigDeadlockTest.Result
indicating access to an uninitialized static field within the test class.
This error disappears if the "System.out.print" calls are replaced by "System.err.println". No tracing output occurs for the deadlock tests; it does appear, however, for a simpler test that simply does
public static void testInitialization() {
new Configuration();
}
As far as I can determine, creating a Configuration instance in 9.7 does not cause either instantiation or static initialization of EnterpriseConfiguration, though it does cause EnterpriseConfiguration to be loaded because of the static reference in class Version.
All this tells us is that the circular dependency is probably somewhere else: there is probably some pair of classes A, B such that creating a new Configuration causes these to be initialized in the order (A, B), and creating an EnterpriseConfiguration causes them to be initialized in the order (B, A). However, there seems to be no systematic way of identifying the pair (A, B). The Oracle JVM seems to have no option to trace static initialization of classes (unlike the IBM JVM, which offers -Xtrace).
Perhaps we should try running under the IBM JDK with this option?
A possible candidate for (A, B) might be (JavaPlatform, JavaPlatformPE). But whether we call new Configuration() or new EnterpriseConfiguration(), these two classes are initialized in the same order.
Empirically, it seems to be possible to get rid of the deadlock by getting rid of the static variable Version.configurationClass, and replacing this by a local variable defined within Configuration.newConfiguration(). However, there seems to be no way of proving that this really eliminates the problem, especially as we haven't really discovered the cause of the problem.
No, on further investigation, the test case still fails after this change.
Updated by Michael Kay over 8 years ago
I've realised that some of my attempts at diagnostics were defeated by the way the test case is constructed: it fires off a separate JVM Process using exec(), and captures the StandardOutput of that Process. So diagnostic messages easily disappear into thin air.
Updated by Gunther Rademacher over 8 years ago
The separate JVM was needed for isolating the class loading from other tests, and for getting multiple chances of reproducing the problem in a single test execution. For analysing the cause, it might be preferable to use the test that is attached to Bug #2327. When I run it from a Windows command line, and hit CTRL-Break for getting a stack trace while it reports the active threads, this is shown:
"Thread-1" #11 prio=5 os_prio=0 tid=0x00000000162f8000 nid=0x21a8 in Object.wait() [0x0000000016a5d000]
java.lang.Thread.State: RUNNABLE
at net.sf.saxon.type.AnyFunctionType.<clinit>(AnyFunctionType.java:30)
at net.sf.saxon.ma.map.MapNew.<clinit>(MapNew.java:35)
at net.sf.saxon.Configuration.init(Configuration.java:436)
at net.sf.saxon.Configuration.<init>(Configuration.java:256)
at InstantiationProblem$2.run(InstantiationProblem.java:24)
"Thread-0" #10 prio=5 os_prio=0 tid=0x00000000162f5000 nid=0x1e44 in Object.wait() [0x000000001695d000]
java.lang.Thread.State: RUNNABLE
at net.sf.saxon.value.SequenceType.<clinit>(SequenceType.java:449)
at net.sf.saxon.functions.StandardFunction$Entry.arg(StandardFunction.java:1039)
at net.sf.saxon.functions.StandardFunction.<clinit>(StandardFunction.java:173)
at com.saxonica.functions.xslt3.StandardFunctionsPE.init(StandardFunctionsPE.java:37)
at com.saxonica.config.ProfessionalConfiguration.<clinit>(ProfessionalConfiguration.java:122)
at InstantiationProblem$1.run(InstantiationProblem.java:18)
That points to the static initializers of AnyFunctionType and SequenceType. Haven't looked up their source code yet, but I guess that they could be mutually dependent on each other?
Updated by Michael Kay over 8 years ago
We've both been following in the same path. Here's what I prepared before I saw your post:
Sending output to a log file rather than standard output, here is the stack trace of the two deadlocked processes:
THREAD 1
net.sf.saxon.value.SequenceType.<clinit>(SequenceType.java:450)
net.sf.saxon.functions.StandardFunction$Entry.arg(StandardFunction.java:1042)
net.sf.saxon.functions.StandardFunction.<clinit>(StandardFunction.java:177)
com.saxonica.functions.xslt3.StandardFunctionsPE.init(StandardFunctionsPE.java:37)
com.saxonica.config.ProfessionalConfiguration.<clinit>(ProfessionalConfiguration.java:122)
nativetests.ConfigDeadlockTest$UseFixForSaxonBug2327$1.run(ConfigDeadlockTest.java:76)
THREAD 2
net.sf.saxon.type.AnyFunctionType.<clinit>(AnyFunctionType.java:31)
net.sf.saxon.ma.map.MapNew.<clinit>(MapNew.java:35)
net.sf.saxon.Configuration.init(Configuration.java:442)
net.sf.saxon.Configuration.<init>(Configuration.java:256)
nativetests.ConfigDeadlockTest$UseFixForSaxonBug2327$2.run(ConfigDeadlockTest.java:81)
So the two classes (A,B) are (SequenceType, AnyFunctionType).
Moving the static variable SINGLE_FUNCTION_TYPE from AnyFunctionType to SequenceType appears to solve the problem.
Patch committed on the 9.7 branch.
Updated by Michael Kay over 8 years ago
I now get a conflict between SequenceType and MapType - similar, but slightly less obvious how to resolve it.
Updated by Michael Kay over 8 years ago
- Status changed from In Progress to Resolved
After another change to fix this, marking as resolved.
It's all terribly ad-hoc. There ought to be some kind of coding policy like not referring to your subclasses in a static initializer, but that's unworkable because you often want to do just that in order to keep your subclasses private...
Updated by O'Neil Delpratt over 8 years ago
- Status changed from Resolved to Closed
- % Done changed from 0 to 100
- Fixed in Maintenance Release 9.7.0.4 added
Bug fix applied in the Saxon 9.7.0.4 maintenance release.
Updated by Ignacio Hernandez-Ros over 8 years ago
Michael Kay wrote:
So the two classes (A,B) are (SequenceType, AnyFunctionType).
Hi Mike et al,
My findings are that the two classes that cannot be loaded (initialized) in different threads are
net.sf.saxon.functions.StandardFunction and class net.sf.saxon.expr.Expression because one depends on the other.
I've added this method to a class and I'm sure it is called even before creating a Container
/**
* This method makes sure the class net.sf.saxon.functions.StandardFunction
* and the class net.sf.saxon.expr.Expression has been
* loaded by the same thread.
* By doing this we are preventing a potential deadlock that happens to occurs
* when the ZipCascadeURIResolver class is loaded asynchronously with the creation
* of a DTSContainer.
*/
public synchronized static void preventDeadLocks() {
try {
// makes sure this class has been loaded and it is initialized prior to create
// any processor instance
Class.forName("net.sf.saxon.functions.StandardFunction");
} catch (ClassNotFoundException e) {
}
}
Updated by Michael Kay over 8 years ago
- Status changed from Closed to In Progress
Updated by O'Neil Delpratt over 8 years ago
- Assignee changed from Michael Kay to O'Neil Delpratt
Updated by O'Neil Delpratt over 5 years ago
- Description updated (diff)
- Assignee changed from O'Neil Delpratt to Michael Kay
Updated by Michael Kay over 5 years ago
- Status changed from In Progress to Closed
I'm closing this because there doesn't seem any useful reason to keep it open.
I don't think that we have eliminated all possible deadlocks during class loading, and I don't think that we ever will. It's not something we see happening very often (if ever), and if we get reports of it causing a problem in a particular environment, then we can open a new bug.
Please register to edit this issue