Project

Profile

Help

Bug #5148

closed

Multithreading issue during compilation

Added by Michael Kay over 2 years ago. Updated about 2 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
Multithreading
Sprint/Milestone:
-
Start date:
2021-10-28
Due date:
% Done:

100%

Estimated time:
Legacy ID:
Applies to branch:
10, 11, 9.9
Fix Committed on Branch:
10, 9.9, trunk
Fixed in Maintenance Release:
Platforms:
.NET, Java

Description


Related issues

Has duplicate Saxon - Bug #5158: Saxon EE 9.9 and above NullReference ExceptionDuplicateMichael Kay2021-11-10

Actions
Actions #1

Updated by Michael Kay over 2 years ago

I've confirmed that I can compile the supplied stylesheet. It's large (80 xsl:includes in the top-level module, compile time around 2 seconds) so studying the source code isn't going to be a good way forward. The driver app to run concurrent compilations is in C#, but I think I'm going to try to reproduce it in Java first because I'm more familiar with debugging multi-threading applications on that platform; I'll switch to C# if I can't reproduce it in Java.

Actions #2

Updated by Michael Kay over 2 years ago

Progress so far: I've run this compilation from a multithreaded Java app - 100 parallel compilations, in either 5 or 20 threads, the whole repeated 20 times, with no failures. This under SaxonJ 10.6. Bytecode was enabled but wouldn't kick in as the transformations were not actually executed.

Next step is to dig out the windows machine and run the code as supplied.

Actions #3

Updated by Michael Kay over 2 years ago

I'm looking again at the stack traces in the original report on StackOverflow - especially the second one, which has line numbers.

DocGen Result Exception. Response is invalid: Request is failed. Response body: {"message":"Error generating document (7cf581ac-2287-4396-bca4-0d5c5af9c308). Error: null (java.lang.NullPointerException)\n Stack Trace: java.lang.NullPointerException
    at net.sf.saxon.functions.FunctionLibraryList.bind(FunctionLibraryList.java:124)
    at net.sf.saxon.functions.FunctionLibraryList.bind(FunctionLibraryList.java:124)
    at net.sf.saxon.expr.parser.XPathParser.parseFunctionCall(XPathParser.java:3356)
    at net.sf.saxon.expr.parser.XPathParser.parseBasicStep(XPathParser.java:2206)

If we assume this is Saxon EE 9.9.1.8, then FunctionLibraryList#124 after EE pre-processing is line #126 in the original source, which is the recursive call

Expression func = lib.bind(functionName, staticArgs, env, reasons);

which is consistent with the fact that line #124 appears twice in the stack trace.

The only way this can throw an NPE is if lib is null, which means that the FunctionLIbraryList contains a null entry - specifically, the "inner" FunctionLibraryList, to account for two levels of recursive call.

The FunctionLibraryList was originally created by StylesheetPackage.createFunctionLibrary(), locally to a compilation. This method constructs an inner FunctionLibraryList by calling Configuration.getBuiltInExtensionLibraryList(). This method reads:

public FunctionLibraryList getBuiltInExtensionLibraryList() {
        if (builtInExtensionLibraryList == null) {
            builtInExtensionLibraryList = new FunctionLibraryList();
            builtInExtensionLibraryList.addFunctionLibrary(VendorFunctionSetHE.getInstance());
            builtInExtensionLibraryList.addFunctionLibrary(MathFunctionSet.getInstance());
            builtInExtensionLibraryList.addFunctionLibrary(MapFunctionSet.getInstance());
            builtInExtensionLibraryList.addFunctionLibrary(ArrayFunctionSet.getInstance());
            builtInExtensionLibraryList.addFunctionLibrary(ExsltCommonFunctionSet.getInstance());
        }
        return builtInExtensionLibraryList;
    }

The method is not synchronised, so two concurrent invocations could both find the initial condition builtInExtensionLibraryList == null true, and could both then make concurrent modifications to the FunctionLibraryList, which could leave it in an inconsistent state.

So I think the fact that this method is not synchronised is consistent with the observed failure.

Actions #4

Updated by Michael Kay over 2 years ago

  • Status changed from New to Resolved
  • Applies to branch 10, 11, 9.9 added
  • Fix Committed on Branch 10, 9.9, trunk added

I'm sufficiently confident in this that although I haven't reproduced the bug, I'm going to apply this patch and mark it resolved.

I've applied the patch on the 9, 10, and 11 branches although I think there's very little chance we'll do another 9.9 maintenance release.

Actions #5

Updated by Michael Kay over 2 years ago

As further evidence that this diagnosis is correct, the relevant code was not present in 9.7 which is consistent with the observation that the problem first appeared in 9.9.

Actions #6

Updated by Michael Kay over 2 years ago

  • Has duplicate Bug #5158: Saxon EE 9.9 and above NullReference Exception added
Actions #7

Updated by O'Neil Delpratt about 2 years ago

  • % Done changed from 0 to 100
  • Fixed in Maintenance Release 11.1 added
  • Platforms .NET, Java added

Bug fix applied in the Saxon 11.1 release.

Actions #8

Updated by O'Neil Delpratt about 2 years ago

Leaving bug as resolved until fix applied to the Saxon 10 maintenance release.

Actions #9

Updated by Debbie Lockett about 2 years ago

  • Status changed from Resolved to Closed
  • Fixed in Maintenance Release 10.7 added
  • Fixed in Maintenance Release deleted (11.1)

Bug fix applied in the Saxon 10.7 maintenance release.

Actions #10

Updated by Debbie Lockett about 2 years ago

  • Fixed in Maintenance Release 11.1 added

Please register to edit this issue

Also available in: Atom PDF