Project

Profile

Help

Bug #3984

closed

Thread safety of index information held by KeyManager

Added by Michael Kay about 6 years ago. Updated about 6 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
Multithreading
Sprint/Milestone:
-
Start date:
2018-10-30
Due date:
% Done:

100%

Estimated time:
Legacy ID:
Applies to branch:
9.8, 9.9
Fix Committed on Branch:
9.8, 9.9
Fixed in Maintenance Release:
Platforms:

Description

The KeyManager stores information about indexes used to support the key() function in the Controller's UserData store, but this store is not thread-safe.

(From code inspection; no symptoms actually observed)

Actions #2

Updated by Michael Kay about 6 years ago

The issue raised by Vladimir in the help forum is rather more serious and difficult to solve than the original topic of this issue, but we may as well treat them together.

Basically, we are synchronising the construction of an index using the document node as the synchronisation object. Firstly, this is a long-running activity, so the lock may be held for some time, though it will only block other threads building an index on the same document. More significantly, when user code is executed under a lock, we have no way of knowing what other locks will be acquired. For example, we can (with some ingenuity!) construct a scenario where a document has two indexes, and one thread starts building index B while it is already building index A, while the other thread does the reverse: leading inevitably to deadlock.

I suspect that the answer to this is to allow two threads both to proceed with building the index, synchronising only at the point where the index is written to the data structure maintaining a collection of indexes. That is, don't hold any locks while executing user code.

Actions #3

Updated by Michael Kay about 6 years ago

I engineered a deadlock with this stylesheet:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
   
   <xsl:variable name="data-1">
      <data1>
         <xsl:for-each select="1 to 100000">
            <entry id="{.}" value="london-{.+.}"/>
         </xsl:for-each>
      </data1>     
   </xsl:variable>
   
   <xsl:variable name="data-2">
      <data2>
         <xsl:for-each select="1 to 100000">
            <entry id="{.}" value="paris-{.+.}"/>
         </xsl:for-each>
      </data2>     
   </xsl:variable>
   
   <xsl:key name="one" match="entry" use="if (/data1) then key('two', @id, $data-2)/@id else @id"/>
   <xsl:key name="two" match="entry" use="if (/data2) then key('one', @id, $data-1)/@id else @id"/>
   
   <xsl:template name="main">
      <xsl:for-each select="1 to 20">
         <xsl:result-document href="out{.}.xml">
            <out>
               <xsl:value-of select="if (. mod 2 = 0) then key('one', '839', $data-1) else key('two', '839', $data-2)"/>
            </out>
         </xsl:result-document>
      </xsl:for-each>
   </xsl:template>
   
</xsl:stylesheet>

Note that these keys are considered "local" rather that "shared" because the key definition references a global variable. This means each transformation will have its own copy of the indexes. But that would happen anyway for indexes on a temporary tree, because each transformation has its own copy of the global variables.

Actions #4

Updated by Michael Kay about 6 years ago

  • Priority changed from Low to Normal

Studying the execution of this test case reveals another possible problem: the global variables are being evaluated more than once, in different threads. Potentially this could result in ($x is $x) returning false, since each evaluation produces a distinct temporary tree. Also of course this affects the construction of indexes.

There is a flag in the Bindery to mark global variables as "executing" (i.e. being evaluated) that is supposed to prevent this. As with the new logic for building indexes, the idea is to let both threads evaluate the variable in parallel, but ensure that the first value that is produced by any thread is used by all threads. This doesn't appear to be working correctly.

Actions #5

Updated by Michael Kay about 6 years ago

It seems that more than one Bindery is being created. The method Controller.getBindery() creates the Bindery for a package on demand, and this is called when a global variable is evaluated. If the first attempts to evaluate global variables are within a concurrent thread, this method is not synchronized, so two threads can each create a separate Bindery.

Fixed this by sychronizing Controller.getBindery(). We now only have one copy of each global variable.

But the KeyManager is making multiple attempts to create the index for a document in the same thread, so we're not home and dry yet.

Actions #6

Updated by Michael Kay about 6 years ago

The test is now running, and reports the an error because the index is circular. Two things we now need to do:

(a) see if we can find a similar test that doesn't fail due to circularity

(b) test the path where the indexes are shared rather than local (which happens if they don't refer to global variables).

Actions #7

Updated by Michael Kay about 6 years ago

I tested the shared path by a temporary patch to force these indexes to be treated as reusable.

Actions #8

Updated by Michael Kay about 6 years ago

  • Status changed from New to Resolved
  • Applies to branch 9.8, 9.9 added
  • Fix Committed on Branch 9.8, 9.9 added

For 9.8 the only change I have made is to synchronize Controller.getBindery(). Given that the problem has not been encountered by any user, I think it's better to avoid changing the 9.8 code if we can.

For 9.9 the changes are:

  • synchronize Controller.getBindery()

  • use a custom master index in the Controller for local indexes rather than relying on the general getUserData() mechanism; thereby solving the thread safety issues;

  • avoid holding a synchronization lock while building an index; instead allow two threads to build the index concurrently, using the index from whichever thread finishes first, and discarding the others.

I have not managed to construct a test case in which deadlock occurs under the old code but execution succeeds with the new code; in all the deadlock cases I have found, the correct result is to fail with a circularity error. However, I believe such cases are theoretically possible.

Actions #9

Updated by O'Neil Delpratt about 6 years ago

  • % Done changed from 0 to 100
  • Fixed in Maintenance Release 9.8.0.15 added

Bug fix applied in the Saxon 9.8.0.15 maintenance release. Leave open to the Saxon 9.9 maintenance release.

Actions #10

Updated by O'Neil Delpratt about 6 years ago

  • Status changed from Resolved to Closed
  • Fixed in Maintenance Release 9.9.0.2 added

Bug fix applied in the Saxon 9.9.0.2 maintenance release.

Please register to edit this issue

Also available in: Atom PDF