Project

Profile

Help

Bug #3927

Multi-threading problems using Saxon-EE xsl:result-document with calls on last()

Added by roland civet over 2 years ago. Updated over 2 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
Multithreading
Sprint/Milestone:
-
Start date:
2018-10-01
Due date:
% Done:

100%

Estimated time:
Legacy ID:
Applies to branch:
9.8, 9.9
Fix Committed on Branch:
9.8, 9.9
Fixed in Maintenance Release:

Description

The issue I'm facing to occurs only if the licence is detected ('Using license serial number XYZ'). If I remove the licence from the folder, everything is fine ('running with licensable features disabled'). Using -opt:0 makes no difference.

The error is twofold:

  • Java.lang.IllegalArgumentException: Attempting to treat a parent pointer as a node at net.sf.saxon.tree.tiny.TinyTree.getNode(TinyTree.java:956)
  • Some child xsl:result-document threads have not finished Error in xsl:result-document/@href on line 96 column 60 of ss4.xsl: XTDE1490: Cannot write more than one result document to the same URI:

It's a pet project. I generate the xml using a dummy template that I iterate n times. After hours of troubleshooting, I have noticed that:

  • It always works fine (with licence and without licence) if the number of nodes is less than 2000
  • If the number of nodes is greater than 2000, it works if licence is not present and fails if the licence is present. I found out the instruction that causes Saxon to crash:
check if position() < last(), then ../item[position() + 1]/@id

If I comment that instruction and licence is present, it works. If no licence, that line is processed okay whatever the number of nodes.

The error "Cannot write more than one result document to the same URI" is impossible. I double checked the xml; there is no risk of collision. I use xsl:result-document/. Href use the position() of current node.

Sadly, using the flag -t does not output the file that is written (it was the case with the prior version 9.8). I noticed with the prior version that files were written file: item_1.html, item_3.html, item_2.html, etc. This is only possible with some parallel processing as my xml is purely sequential (from id 1 to id n) and I use position() to write file.

My questions are:

  • Are you aware of the "bug"?
  • To bypass that bug, how can I disable the parallel processing if the licence is active?

Saxon-EE 9.9.0.1J from Saxonica Java version 10.0.2 Mac OS High Sierra 10.13.6

ss5.xsl (1.67 KB) ss5.xsl stylesheet roland civet, 2018-10-01 11:29
items.xml (5.08 MB) items.xml xml source roland civet, 2018-10-01 11:29
error.txt (3.17 KB) error.txt exceptions: v9.9 and v9.8 roland civet, 2018-10-01 23:03

History

#1 Updated by Michael Kay over 2 years ago

  • Category set to Multithreading
  • Assignee set to Michael Kay

Thanks for reporting it. This looks like a nasty multithreading issue. Please could you let us have a test case that demonstrates the problem?

Use of last() within multithreaded code requires some devious logic and the problem is almost certainly in that area.

Running without a license effectively disables multithreading on xsl:result-document. You can achieve the same effect by setting the attribute saxon:asynchronous="no" on the xsl:result-document instruction, or by setting the global configuration property Feature.ALLOW_MULTITHREADING to false.

I think the failure to output the filename when running with -t is an oversight. (It's typically the kind of thing that's quite hard to capture with unit tests, but I'll see if there's a way....)

#2 Updated by roland civet over 2 years ago

Hi Michael,

You'll find in attachment both the stylesheet and the xml source. The command line I use is:

java -jar saxon9ee.jar -t -opt:0 -s:items.xml -xsl:ss5.xsl -o:output/index.html

The instruction that is causing the issue is at line 59. If you comment it, it will work fine:

					<xsl:if test="position() &lt; last()">
						<li><a href="item_{../item[position() + 1]/@id}.html">next</a></li>
					</xsl:if>	

My purpose if to generate a static website in 2 languages featuring a navigation previous/next.

I have cleaned up the stylesheet to keep only the relevant sections. To reproduce the issue, I generated another xml with 20000 . I has crashed 9 times out of 10. The xml has a size of 5 mg.

Hope this helps.

Cheers,

Roland

#3 Updated by roland civet over 2 years ago

The formatting cropped the instruction:

<xsl:if test="position() < last()">

next

#4 Updated by Michael Kay over 2 years ago

My first attempts to run it, everything works fine.

I've reinstated the code that outputs the message for each file written when -t is set.

Now I've set -opt:0, and I get the error java.lang.IllegalArgumentException: Attempting to treat a parent pointer as a node

And after a couple more runs that don't fail, I get

Cannot write more than one result document to the same URI: file:/Users/mike/bugs/2018/civet3927/en/item_125.html

The curious thing here is the sequence of trace messages. It starts:

Writing to file:/Users/mike/bugs/2018/civet3927/en/index.html
Writing to file:/Users/mike/bugs/2018/civet3927/fr/index.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_1.html
Writing to file:/Users/mike/bugs/2018/civet3927/fr/item_1.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_2.html
Writing to file:/Users/mike/bugs/2018/civet3927/fr/item_2.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_3.html
Writing to file:/Users/mike/bugs/2018/civet3927/fr/item_3.html
Writing to file:/Users/mike/bugs/2018/civet3927/fr/item_4.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_4.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_5.html
Writing to file:/Users/mike/bugs/2018/civet3927/fr/item_5.html
Writing to file:/Users/mike/bugs/2018/civet3927/fr/item_6.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_6.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_7.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_8.html
Writing to file:/Users/mike/bugs/2018/civet3927/fr/item_7.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_9.html
Writing to file:/Users/mike/bugs/2018/civet3927/fr/item_8.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_10.html
Writing to file:/Users/mike/bugs/2018/civet3927/fr/item_9.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_11.html
Writing to file:/Users/mike/bugs/2018/civet3927/fr/item_10.html
Writing to file:/Users/mike/bugs/2018/civet3927/fr/item_12.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_12.html
Writing to file:/Users/mike/bugs/2018/civet3927/fr/item_160.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_125.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_260.html
Writing to file:/Users/mike/bugs/2018/civet3927/fr/item_304.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_399.html

and then produces lots of apparently random (but generally increasing) item numbers, before resuming with the orderly sequence

Writing to file:/Users/mike/bugs/2018/civet3927/en/item_13.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_19628.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_14.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_15.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_16.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_17.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_18.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_19.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_20.html

and then proceeding sequentially until it hits one of the "random" numbers already processed, in this case 125, at which point it fails with the duplicate URI error.

#5 Updated by Michael Kay over 2 years ago

In another run I'm now seeing

java.lang.ArrayIndexOutOfBoundsException: -1
	at net.sf.saxon.tree.tiny.NamedChildIterator.next(NamedChildIterator.java:68)
	at net.sf.saxon.tree.tiny.NamedChildIterator.next(NamedChildIterator.java:23)
	at net.sf.saxon.om.FocusTrackingIterator.next(FocusTrackingIterator.java:69)
	at net.sf.saxon.trans.Mode.applyTemplates(Mode.java:421)

and it feels to me as if we've got a NamedChildIterator that's being accessed from more than one thread.

Adding assert Thread.currentThread() == owner into NamedChildIterator.next(), where owner is initialized to the creating thread, failed to confirm this.

I tried adding saxon:asynchronous="no" to the outer xsl:result-document elements. This triggered bug #3931. After fixing this, the test runs successfully, but it doesn't appear to be doing multithreading on the inner xsl:result-document calls (all the output files are written sequentially).

(Note, the stylesheet as supplied bound the prefix "saxon" to the old Saxon 6.5 namespace, I changed this).

I then commented out the generation of files for French language so the outer xsl:result-document only executes once. This ran to completion without error, but generated very few result documents: the sequence was like this:

Writing to file:/Users/mike/bugs/2018/civet3927/en/index.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_1.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_2.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_3.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_4.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_5.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_6.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_7.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_8.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_9.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_10.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_11.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_12.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_13.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_14.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_15.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_16.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_17.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_18.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_645.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_5435.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_19.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_6182.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_6876.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_7599.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_8467.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_8774.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_9977.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_10720.html

It would seem to be worth studying this simplified problem.

#6 Updated by roland civet over 2 years ago

I ran the stylesheet against v9.8 and v9.9. Both threw an exception, though the error messages/stack traces were different (see attachment "errors.txt")

I upgraded Java from 10.0.2 to version 11. No change.

To bypass the problem, I use count(../item) instead of last(). It does work on v9.8 and v9.9 (I repeated the operations 5-6 times).

I'm still puzzled by the flag -t. I have double checked carefully each of the command line. Only v9.8 returns outputs files URI being written.

When using v9.8, I noticed the parallel output for EN and FR. Keeping the processing instruction for 1 language only still produced the problem if the number of nodes to be treated is significant. When the number of item is < 100, it always works whatever the version, 1 or 2 languages.

#7 Updated by Michael Kay over 2 years ago

Thanks. That new information is all consistent with the theory I am investigating. It's definitely a multithreading bug in Saxon so (a) the Java version isn't going to make any difference, (b) the observed effect is going to vary from one run to another, (c) the more records you process, the higher the probability of a failure.

The -t issue is recognized as a separate 9.9 issue.

#8 Updated by Michael Kay over 2 years ago

I'm currently probing how last() gets evaluated. It calls XPathContextMinor.getLength(), and I see trace lines like this:

getLength() on net.sf.saxon.om.FocusTrackingIterator@5bf588ce in pool-1-thread-2 last = 9131
getLength() on net.sf.saxon.om.FocusTrackingIterator@5bf588ce in pool-1-thread-4 last = 9131

which show that the method is called on the same FocusTrackingIterator object in two different threads; since the method isn't synchronized, and updates mutable state, this looks like being the problem. (Generally in my experience, once we've found code being executed that's not thread-safe, there's very little point in investigating why it causes the observed symptoms; once data structures become incoherent, anything can happen.)

The method used in xsl:result-document to get the value of last() is synchronized, but it is synchronized on a an object that isn't shared between the threads, so the synchronization is useless. I'm experimenting with synchronizing XPathContextMinor.getLast().

I've been trying synchronizing various points in this logic on various objects and so far I haven't found one that solves the problem.

#9 Updated by Michael Kay over 2 years ago

I think the problem isn't just that the last() logic synchronizing its own internal logic; the problem is that the call on last() is consuming an iterator that another thread is reading item-by-item, and as a result the other thread makes wild jumps forwards.

Synchronizing both FocusTrackingIterator.getLast() and FocusTrackingiterator.next() solves the problem. We should probably also synchronize other methods that access the base iterator, such as hasNext() and materialize().

How expensive is this? FocusTrackingIterator.next() is a very widely used method, and although it's said that synchronizing these days is very cheap if there is no actual contention, we need to check for any performance regression.

But is there any alternative? Unfortunately we don't know when we create the iterator that it's going to be accessed in multiple threads (it will typically be iterating over the items selected by xsl:apply-templates, and we have no idea how they will be processed.) Nor do we know that the iterator is going to have last() called on it.

Perhaps there are cases where we know that result-document is NOT going to be called on the iterator, and/or that last() is not going to be called - for example (a) if the stylesheet doesn't use xsl:result-document at all, or (b) if we're in an xsl:for-each where the action part makes no template calls that expose the focus to another component (and makes no result-document calls, of course). Whether it's worth detecting these cases in order to avoid the tiny overhead of synchronizing the next() call is an open question.

#10 Updated by Michael Kay over 2 years ago

  • Subject changed from Okay if no licence - Error if licence active - Cannot write more than one result document to the same URI to Multi-threading problems using Saxon-EE xsl:result-document with calls on last()
  • Priority changed from High to Normal
  • Applies to branch 9.8 added

#11 Updated by roland civet over 2 years ago

Is xsl:iterate another instruction that internally call last()/FocusTrackingIterator.getLast()?

#12 Updated by Michael Kay over 2 years ago

  • Status changed from New to Resolved
  • Fix Committed on Branch 9.8, 9.9 added

I have fixed the problem as follows:

  • In 9.8, I have simply marked all the methods on FocusTrackingIterator as synchronized.

  • In 9.9 I have introduced a subclass, MultithreadedFocusTrackingIterator, in which the methods are synchronized. As a consequence of bug #3636, the FocusTrackingIterator for 9.9 is already created using a factory function, and we now allocate a different factory function (to use the thread-safe version) if we are in Saxon-EE, running a stylesheet that uses xsl:result-document, with allow-multithreading enabled in the configuration. This avoid the overhead of synchronization except in the cases where it is needed.

The test case has been productised in XSLT30Extra as test threads-026. At the same time this tests the saxon:asynchronous extension attribute.

#13 Updated by Michael Kay over 2 years ago

Some further observations:

  • The test stylesheet used the old Saxon namesapce, this has been corrected.

  • The logic to access theprevious and next items was incorrect, it now reads:

                                 <ul class="nav">
					<xsl:variable name="pos" select="position()"/>
					<xsl:if test="$pos > 1">
						<li><a href="item_{../item[$pos - 1]/@id}.html">prev</a></li>
					</xsl:if>
					<xsl:if test="$pos &lt; last()">
						<li><a href="item_{../item[$pos + 1]/@id}.html">next</a></li>
					</xsl:if>	
				</ul>

In response to the question in comment #11, yes, xsl:iterate, xsl;for-each, and any other instruction that advances the focus (e.g. xsl:analyze-string, xsl:for-each-group) will use this mechanism. The significant thing about this bug, though, is the use of last() in conjunction with xsl:result-document, because xsl:result-document is the only standard instruction that causes asynchronous execution of XSLT instructions. The other case I looked at was xsl:for-each with threads=N, and I think this should be OK because it uses the same logic, but I haven't written an explicit test.

#14 Updated by roland civet over 2 years ago

Thanks Michael. Good job :)

#15 Updated by O'Neil Delpratt over 2 years ago

  • % Done changed from 0 to 100
  • Fixed in Maintenance Release 9.8.0.15 added

Bug fix applied in the Saxon 9.8.0.15 maintenance release. Leave open to the Saxon 9.9 maintenance release.

#16 Updated by O'Neil Delpratt over 2 years ago

  • Status changed from Resolved to Closed
  • Fixed in Maintenance Release 9.9.0.2 added

Bug fix applied in the Saxon 9.9.0.2 maintenance release.

Please register to edit this issue

Also available in: Atom PDF