PushToPullIterator does not implement the `discharge()` method
Bug #5177 introduced the method
SequenceIterator.discharge() to solve the dilemma that some iterators hold resources, and therefore need to be explicitly closed if not read to completion, but some consumers of iterators aren't able to issue a close() because they don't know whether the user-written code will read more data or not. The idea is that a consumer that can't guarantee to call close() can instead call
discharge() to provide notification of the fact that
close() might not be called.
PushToPullIterator has a discharge() method that throws a
NotImplementedException. This hasn't caused any failure on the 11.x branch, but it is causing a couple of streaming tests to fail on the 12.x branch, and it's entirely possible that similar conditions could cause an 11.x failure. Specifically, tests stream-107 and stream-109 are failing.
Updated by Michael Kay 4 months ago
What basically triggers the error is that on the Saxon 12.x path (work in progress) we're creating an unnecessary MemoClosure to hold intermediate results; and the MemoClosure is the classic case where we simply don't know whether it's going to be read to completion. In this particular case the input to the MemoClosure comes from a PushPullIterator (we're reading streamed XML input, which is being pushed to a buffer by the XML parser in one thread, while the iterator is reading the data from that buffer in another thread. If the iterator isn't properly closed then the parsing thread is going to hang.
I can get rid of the problem fairly easily in this instance by not creating the MemoClosure, but that doesn't solve the underlying problem.
discharge() in the PushPullIterator would mean reading ahead to the end of the file, "just in case", which would destroy streaming, and eliminate the possibility of an early exit when we've read all the data we need. So there's an issue here that needs to be addressed.
Updated by Michael Kay 3 months ago
I'm wondering about a different design (replacing the discharge() mechanism) where any iterator that holds resources registers itself at the level of the Controller, and the Controller tidies up any unclosed resources when it shuts down (at the end of a transformation or query).
I don't think it's too much of a problem to ensure that all affected iterators have access to the context and hence to the Controller.
I'm not quite sure how this will work for free-standing XPath expressions where each evaluation gets a new Controller.
Please register to edit this issue