Threads left unclosed if collection() is not read to completion
In Saxon-EE, by default, the collection() function is multithreaded: a thread pool is allocated for reading the documents in the collection, so they are processed in parallel.
If the result of the collection() function is bound to a variable, and the contents of the variable are not fully processed, then the thread pool will not be properly closed down, and threads may be left in suspended animation.
Test case fn-collection-10 demonstrates the problem (though it appears to succeed in some environments and to fail in others). The code is of the form
let $c := collection($uri) for $i in (3, 6, 8) return $c[$i]
Internally, the collection() function produces a MultithreadedItemMappingIterator; a MemoClosure is created with this as its InputIterator, and when selected items are read by indexing into the variable, as many items are read from the InputIterator as are needed to satisfy the request. But the variable is never read to completion, so the InputIterator of the MemoClosure is never closed, and it is the close() call on a MultithreadedItemMappingIterator that causes the thread pool to be shut down
#1 Updated by Michael Kay 3 days ago
This is a tough one to solve.
We don't want to lose the
MemoClosure mechanism whereby an expression
$m can be evaluated without fully materialising the value of
$m. Yet at the same time there are iterators (not just the multithreaded iterator used for collections, but also for example an
UnparsedTextLines iterator) that do need to be closed to release resources if they are not read to completion.
The only solution I can see at the moment is to distinguish iterators that need to be closed because they hold resources, and to avoid using these in a MemoClosure. This isn't easy of course because it's a transitive property: an iterator that's derived from one that holds resources itself needs to be closed so it can close the base iterator.
It's a little unfortunate that on the 11 branch we've discarded the general mechanism for SequenceIterator properties. Perhaps we should reinstate it.
#2 Updated by Michael Kay 3 days ago
I've added a method
SequenceIterator.mustClose() with a default implementation that returns false. On
MultithreadedItemMappingIterator the method returns true; on
ItemMappingIterator it returns
inputIterator.mustClose() is true, then the
inputIterator is fully evaluated and
sequence is set to a
This solves the problem for this test case (running in the debugger demonstrates that all threads are closed down properly).
It remains to ensure that the new
mustClose() method is properly implemented for all other implementations of
I've been doing this on the 11 branch; it also needs retro-fitting to 10.
#3 Updated by Michael Kay 3 days ago
This solution doesn't work in cases where we create a
MemoClosure for the value of a variable, and at the time we create it, we don't know whether it will hold resources or not. For example:
<xsl:variable name="temp" select="my:f(), my:g(), my:h()"/>
my:h() makes a call on
$temp is represented by a
MemoClosure whose inputIterator is a
BlockIterator - effectively a concatenation of lazily allocated iterators - and at the time we create it, we have no idea whether it will contain an iterator that must be closed to release resources. We could assume the worst, that is, have
mustClose() return true; but this would prevent lazy evaluation in many cases where it is perfectly safe.
#4 Updated by Michael Kay 3 days ago
Another idea: instead of the iterator telling the caller that it must be closed after use, have the caller tell the iterator that it can't guarantee to call close(). An iterator that needs to release resources on close will then have to ensure that all the resources needed are acquired, consumed, and released eagerly.
So we add a method
SequenceIterator.getSafeIterator() which returns an iterator over the same items, such that it's OK if the new iterator is abandoned without a call on
close(). For most iterators,
getSafeIterator() will return
this, but (for example):
MultithreadedItemMappingIteratorit will return
SequenceTool.makeGroundedValue(this).iterate()(forcing eager evaluation)
ItemMappingIteratorit will set
baseIterator = baseIterator.getSafeIterator();
BlockIterator, it will call
SequenceIterator.getSafeIterator()on each of the component iterators as it is constructed.
#6 Updated by Michael Kay 3 days ago
I have implemented (currently on the 11 branch only) a solution based on comment #4. The SequenceIterator interface has a new method discharge() which is called (by MemoClosure only, at present) to indicate that the SequenceIterator should relinquish all resources while remaining prepared to deliver the remaining items in the sequence. The default implementation does nothing; delegating iterators call discharge() on their base iterators; and iterators that actually hold resources evaluate themselves eagerly and then relinquish the resources, substituting an iterator over the grounded results. The BlockIterator remembers that discharge() was called, and if so, it calls discharge() on each component iterator as soon as it is created.
#8 Updated by Michael Kay 2 days ago
The tests now seem to be working fine in SaxonJ-EE 11. But I've also applied them to SaxonCS 11, and I'm getting an occasional (non-repeatable) failure in the QT3 fn:collection() tests. I can't be sure what's happening here - I suspect it's unrelated, and is due to the collection() function in SaxonCS delivering collection results in unpredictable order (which is the subject of another open bug).
Please register to edit this issue