Project

Profile

Help

Bug #2644

closed

9.7 regression: Deadlock for concurrent instantiation of Configuration and EnterpriseConfiguration

Added by Gunther Rademacher about 8 years ago. Updated about 5 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
Build and release
Sprint/Milestone:
-
Start date:
2016-02-24
Due date:
% Done:

100%

Estimated time:
Legacy ID:
Applies to branch:
9.7
Fix Committed on Branch:
9.7
Fixed in Maintenance Release:
Platforms:

Description

The chance of dealock, that was reported in Bug #2327, and fixed for 9.6.0.5, has returned with Saxon 9.7.

The test program attached to Bug #2327 can be used to reproduce it. The workaround that is proposed there still applies.

I am also attaching a TestNG test for validation of both the workaround and the fix.

Tested on

  • Saxon-EE 9.7.0.3J

  • Java version 1.8.0_51


Files

TestSaxonConfigurationInstantiation.java (4.11 KB) TestSaxonConfigurationInstantiation.java TestNG test for validating fix and workaround of deadlock issue Gunther Rademacher, 2016-02-24 18:05
Actions #1

Updated by Gunther Rademacher about 8 years ago

Now attaching the unit test that I had already mentioned.

It looks like I was wrong with my statement that the same workaround still applies. While originally class.forName on Configuration was sufficient, working around now requires instantiating Configuration, before EnterpriseConfiguration.

Actions #2

Updated by Michael Kay about 8 years ago

I find it hard to see why the code in 9.6 (apparently) works and that in 9.7 doesn't.

Investigating it, though, I have found a rather better explanation of what's happening here: https://www.farside.org.uk/201510/deadlocks_in_java_class_initialisation

It seems a wonder that it doesn't happen more often, since the examples in that post seem to be very common coding patterns.

Actions #3

Updated by Michael Kay about 8 years ago

  • Category set to Build and release
  • Status changed from New to In Progress
  • Assignee set to Michael Kay
  • Priority changed from Low to Normal
  • Applies to branch 9.7 added
  • Fix Committed on Branch 9.7 added

Investigating this has taken me up some strange avenues.

For example I tried tracing static class initialization by adding blocks such as

static {

    System.out.println("\nKILROY Initializing Configuration in thread " + Thread.currentThread());

}

to key classes. This led to the failure:

No enum constant nativetests.ConfigDeadlockTest.Result

indicating access to an uninitialized static field within the test class.

This error disappears if the "System.out.print" calls are replaced by "System.err.println". No tracing output occurs for the deadlock tests; it does appear, however, for a simpler test that simply does

public static void testInitialization() {

    new Configuration();

}

As far as I can determine, creating a Configuration instance in 9.7 does not cause either instantiation or static initialization of EnterpriseConfiguration, though it does cause EnterpriseConfiguration to be loaded because of the static reference in class Version.

All this tells us is that the circular dependency is probably somewhere else: there is probably some pair of classes A, B such that creating a new Configuration causes these to be initialized in the order (A, B), and creating an EnterpriseConfiguration causes them to be initialized in the order (B, A). However, there seems to be no systematic way of identifying the pair (A, B). The Oracle JVM seems to have no option to trace static initialization of classes (unlike the IBM JVM, which offers -Xtrace).

Perhaps we should try running under the IBM JDK with this option?

A possible candidate for (A, B) might be (JavaPlatform, JavaPlatformPE). But whether we call new Configuration() or new EnterpriseConfiguration(), these two classes are initialized in the same order.

Empirically, it seems to be possible to get rid of the deadlock by getting rid of the static variable Version.configurationClass, and replacing this by a local variable defined within Configuration.newConfiguration(). However, there seems to be no way of proving that this really eliminates the problem, especially as we haven't really discovered the cause of the problem.

No, on further investigation, the test case still fails after this change.

Actions #4

Updated by Michael Kay about 8 years ago

I've realised that some of my attempts at diagnostics were defeated by the way the test case is constructed: it fires off a separate JVM Process using exec(), and captures the StandardOutput of that Process. So diagnostic messages easily disappear into thin air.

Actions #5

Updated by Gunther Rademacher about 8 years ago

The separate JVM was needed for isolating the class loading from other tests, and for getting multiple chances of reproducing the problem in a single test execution. For analysing the cause, it might be preferable to use the test that is attached to Bug #2327. When I run it from a Windows command line, and hit CTRL-Break for getting a stack trace while it reports the active threads, this is shown:

"Thread-1" #11 prio=5 os_prio=0 tid=0x00000000162f8000 nid=0x21a8 in Object.wait() [0x0000000016a5d000]
   java.lang.Thread.State: RUNNABLE
        at net.sf.saxon.type.AnyFunctionType.<clinit>(AnyFunctionType.java:30)
        at net.sf.saxon.ma.map.MapNew.<clinit>(MapNew.java:35)
        at net.sf.saxon.Configuration.init(Configuration.java:436)
        at net.sf.saxon.Configuration.<init>(Configuration.java:256)
        at InstantiationProblem$2.run(InstantiationProblem.java:24)

"Thread-0" #10 prio=5 os_prio=0 tid=0x00000000162f5000 nid=0x1e44 in Object.wait() [0x000000001695d000]
   java.lang.Thread.State: RUNNABLE
        at net.sf.saxon.value.SequenceType.<clinit>(SequenceType.java:449)
        at net.sf.saxon.functions.StandardFunction$Entry.arg(StandardFunction.java:1039)
        at net.sf.saxon.functions.StandardFunction.<clinit>(StandardFunction.java:173)
        at com.saxonica.functions.xslt3.StandardFunctionsPE.init(StandardFunctionsPE.java:37)
        at com.saxonica.config.ProfessionalConfiguration.<clinit>(ProfessionalConfiguration.java:122)
        at InstantiationProblem$1.run(InstantiationProblem.java:18)

That points to the static initializers of AnyFunctionType and SequenceType. Haven't looked up their source code yet, but I guess that they could be mutually dependent on each other?

Actions #6

Updated by Michael Kay about 8 years ago

We've both been following in the same path. Here's what I prepared before I saw your post:

Sending output to a log file rather than standard output, here is the stack trace of the two deadlocked processes:

THREAD 1

net.sf.saxon.value.SequenceType.<clinit>(SequenceType.java:450)

net.sf.saxon.functions.StandardFunction$Entry.arg(StandardFunction.java:1042)

net.sf.saxon.functions.StandardFunction.<clinit>(StandardFunction.java:177)

com.saxonica.functions.xslt3.StandardFunctionsPE.init(StandardFunctionsPE.java:37)

com.saxonica.config.ProfessionalConfiguration.<clinit>(ProfessionalConfiguration.java:122)

nativetests.ConfigDeadlockTest$UseFixForSaxonBug2327$1.run(ConfigDeadlockTest.java:76)

THREAD 2

net.sf.saxon.type.AnyFunctionType.<clinit>(AnyFunctionType.java:31)

net.sf.saxon.ma.map.MapNew.<clinit>(MapNew.java:35)

net.sf.saxon.Configuration.init(Configuration.java:442)

net.sf.saxon.Configuration.<init>(Configuration.java:256)

nativetests.ConfigDeadlockTest$UseFixForSaxonBug2327$2.run(ConfigDeadlockTest.java:81)

So the two classes (A,B) are (SequenceType, AnyFunctionType).

Moving the static variable SINGLE_FUNCTION_TYPE from AnyFunctionType to SequenceType appears to solve the problem.

Patch committed on the 9.7 branch.

Actions #7

Updated by Gunther Rademacher about 8 years ago

Great - thanks!

Actions #8

Updated by Michael Kay about 8 years ago

I now get a conflict between SequenceType and MapType - similar, but slightly less obvious how to resolve it.

Actions #9

Updated by Michael Kay about 8 years ago

  • Status changed from In Progress to Resolved

After another change to fix this, marking as resolved.

It's all terribly ad-hoc. There ought to be some kind of coding policy like not referring to your subclasses in a static initializer, but that's unworkable because you often want to do just that in order to keep your subclasses private...

Actions #10

Updated by O'Neil Delpratt about 8 years ago

  • Status changed from Resolved to Closed
  • % Done changed from 0 to 100
  • Fixed in Maintenance Release 9.7.0.4 added

Bug fix applied in the Saxon 9.7.0.4 maintenance release.

Actions #11

Updated by Ignacio Hernandez-Ros about 8 years ago

Michael Kay wrote:

So the two classes (A,B) are (SequenceType, AnyFunctionType).

Hi Mike et al,

My findings are that the two classes that cannot be loaded (initialized) in different threads are

net.sf.saxon.functions.StandardFunction and class net.sf.saxon.expr.Expression because one depends on the other.

I've added this method to a class and I'm sure it is called even before creating a Container

	/**
	 * This method makes sure the class net.sf.saxon.functions.StandardFunction
	 * and the class net.sf.saxon.expr.Expression has been
	 * loaded by the same thread. 
	 * By doing this we are preventing a potential deadlock that happens to occurs 
	 * when the ZipCascadeURIResolver class is loaded asynchronously with the creation
	 * of a DTSContainer.
	 */
	public synchronized static void preventDeadLocks() {
		try {
			// makes sure this class has been loaded and it is initialized prior to create
			// any processor instance
			Class.forName("net.sf.saxon.functions.StandardFunction");
		} catch (ClassNotFoundException e) {
		}
	}

Actions #12

Updated by Michael Kay about 8 years ago

  • Status changed from Closed to In Progress
Actions #13

Updated by O'Neil Delpratt about 8 years ago

  • Assignee changed from Michael Kay to O'Neil Delpratt
Actions #14

Updated by O'Neil Delpratt about 5 years ago

  • Description updated (diff)
  • Assignee changed from O'Neil Delpratt to Michael Kay
Actions #15

Updated by Michael Kay about 5 years ago

  • Status changed from In Progress to Closed

I'm closing this because there doesn't seem any useful reason to keep it open.

I don't think that we have eliminated all possible deadlocks during class loading, and I don't think that we ever will. It's not something we see happening very often (if ever), and if we get reports of it causing a problem in a particular environment, then we can open a new bug.

Please register to edit this issue

Also available in: Atom PDF