Bug #5896: xslt30extra test threads-026 - wrong results (intermittent failure) - Saxon - Saxonica Developer Community

Actions

Send by e-mail Copy link

Bug #5896

closed

xslt30extra test threads-026 - wrong results (intermittent failure)

Added by Michael Kay almost 2 years ago. Updated about 1 year ago.

Status:

Closed

Priority:

Low

Assignee:

Michael Kay

Category:

Sprint/Milestone:

Start date:

2023-03-02

Due date:

% Done:

Estimated time:

Legacy ID:

Applies to branch:

11, 12, trunk

Fix Committed on Branch:

11, 12, trunk

Fixed in Maintenance Release:

12.4

Platforms:

.NET, Java

Description

Test case threads-026 is producing incorrect results. The failure is intermittent.

The test is calling xsl:result-document repeatedly while iterating over an input sequence, and within xsl:result-document there are references to position() and last(). The output suggests that both variables are set to zero.

The test usually seems to succeed if run on its own, and usually fails if run as part of the sequence threads-02[0-6]. But this isn't universally true.

Actions

Copy link

Updated by Michael Kay almost 2 years ago

No, looking at the output more carefully, position() is 1 and last() is zero.

Actions

Copy link

Updated by Michael Kay almost 2 years ago

The approach to evaluating last() in an asyncronously-evaluated result-document instruction within a loop seems completely flawed. We're evaluating result-document using a context whose current iterator is a ManualIterator, and the getLast() method on the new context simply calls getLength() on the original iterator. This is a plain FocusTrackingIterator (not the multithreaded variety) so its getLast() method is not synchronised, and as far as I can see two result-document instructions can be consuming the iterator at the same time.

(Did this ever work, or is a new problem in 12.x?)

Actions

Copy link

Updated by Michael Kay almost 2 years ago

I've just run test threads-026 under Saxon 11 and it crashes.

The test was written in response to bug #3927. That bug entry contains a clear description of the problem. It says that the solution is to allocate a MultithreadedFocusTrackingIterator in cases where "we are in Saxon-EE, running a stylesheet that uses xsl:result-document, with allow-multithreading". This doesn't appear to be happening.

Actions

Copy link

Updated by Michael Kay almost 2 years ago

With Saxon11 the problem only occurs with bytecode enabled - a FocusTrackingIterator (non-multithreaded) is being created from within generated bytecode.

With Saxon12, the problem seems to be the method SequenceTool.focusTracker,() which always generates a FocusTrackingIterator (non-multithreaded) . Many of the expression elaborators are calling this method rather than using the factory in the configuration.

Actions

Copy link

Updated by Michael Kay almost 2 years ago

With a change to the FilterExpression elaboration code, the test is now consistently using a MultithreadedFocusTrackingIterator.

It now succeeds most of the time, but I still get an occasional failure -- especially when the test is run as part of the whole test set. The failure generally shows last="0", but I have also seen last="-1". I suspect a timing issue. Unfortunately adding tracing output, or using the debugger, affects the timing, which makes it hard to investigate.

With trace output, the test generally succeeds, and the trace output shows that it is all working as designed: the first time last() is called on the FocusTrackingIterator, it reads the base iterator to completion, saving the results in a list, and subsequent calls on next() read items from the list.

Actions

Copy link

Updated by Michael Kay almost 2 years ago

By trapping the case where last(0 returns -1 and using println diagnostics (the failure doesn't occur under the debugger), I have established that in the failure cases, the current iterator is a ManualIterator rather than a FocusTrackingIterator.

It seems likely that this is the ManualIterator created at EnterpriseConfiguration.processResultDocument(). Note that in this test case the xsl:result-document calls are nested, so the ManualIterator created for the outer xsl:result-document instruction will be the target of the last() call in the inner xsl:result-document.

The trace also indicates that we're creating a FocusTrackingIterator over a ManualIterator. This is unnecessary, because ManualIterator already implements the FocusIterator interface. I've fixed that, at least for the case where it's generated from FilterExpression. We still get the failure, but last() is now zero rather than -1.

The target of last() is also a ManualIterator in the success case. Suspicion therefore falls on how last() is calculated for a ManualIterator, which is by a callback to the getLast() method of the originating context.

There's definitely a bug here but it's proving very elusive.

Actions

Copy link

Updated by Michael Kay almost 2 years ago

I have a theory. Tracing FocusTrackingIterator.getLength(), it is returning -1 when pos is -1, which means the entire sequence has already been read. Changing FocusTrackingIterator to set last = pos when we find there are no more items in the base sequence could eliminate the problem. Needs confirmation.

Actions

Copy link

Updated by Michael Kay almost 2 years ago

I'm reasonably confident this was the key.

Explanation: we're iterating over a sequence S, and for each item we kick off an asynchronous call on xsl:result-document, which calls last() on the iterator. It is possible that the first call on last() will come when S has already been read to completion. It obviously makes sense that in that situation the iterator knows how long S was, and can return the result directly. Furthermore, if we don't do that, the alternative approach is going to give the wrong answer: it determines how many remaining items there are in the sequence (zero), and adds this to the current position, which has been set to -1 .

Note that an asynchronous call to position() doesn't suffer this problem, because the current position (and other stuff like the current regex group) was saved in the ManualIterator before firing off the asynchronous thread. The only reason this isn't done for last() is that evaluating last() just in case it's needed is excessively expensive, especially in memory.

Actions

Copy link

Updated by Michael Kay almost 2 years ago

Status changed from New to In Progress
Applies to branch 11, 12, trunk added
Fix Committed on Branch 11, 12, trunk added
Platforms .NET, Java added

Saxon 11 is also failing, but only with bytecode enabled. I have applied these fixes, but they don't seem to be enough. Leaving open for now.

Actions

Copy link

#10

Updated by Michael Kay about 1 year ago

Status changed from In Progress to Resolved

I'm going to treat this as resolved. We've fixed a number of problems along the way, and the comment trail suggests that the only remaining suspected issue was on the 11.x branch with bytecode enabled, and it's not a productive use of time to investigate that unless a customer hits an actual problem.

Actions

Copy link

#11

Updated by O'Neil Delpratt about 1 year ago

Status changed from Resolved to Closed
Fixed in Maintenance Release 12.4 added

Bug fix applied in the Saxon 12.4 Maintenance release

Please register to edit this issue

Actions

Send by e-mail Copy link

Also available in: Atom PDF

Project

Profile

Help

Saxon

Bug #5896

xslt30extra test threads-026 - wrong results (intermittent failure)

Updated by Michael Kay almost 2 years ago

Updated by Michael Kay almost 2 years ago

Updated by Michael Kay almost 2 years ago

Updated by Michael Kay almost 2 years ago

Updated by Michael Kay almost 2 years ago

Updated by Michael Kay almost 2 years ago

Updated by Michael Kay almost 2 years ago

Updated by Michael Kay almost 2 years ago

Updated by Michael Kay almost 2 years ago

Updated by Michael Kay about 1 year ago

Updated by O'Neil Delpratt about 1 year ago