Bug #3927
closedMulti-threading problems using Saxon-EE xsl:result-document with calls on last()
Added by roland civet about 6 years ago. Updated about 6 years ago.
100%
Description
The issue I'm facing to occurs only if the licence is detected ('Using license serial number XYZ'). If I remove the licence from the folder, everything is fine ('running with licensable features disabled'). Using -opt:0 makes no difference.
The error is twofold:
- Java.lang.IllegalArgumentException: Attempting to treat a parent pointer as a node at net.sf.saxon.tree.tiny.TinyTree.getNode(TinyTree.java:956)
- Some child xsl:result-document threads have not finished Error in xsl:result-document/@href on line 96 column 60 of ss4.xsl: XTDE1490: Cannot write more than one result document to the same URI:
It's a pet project. I generate the xml using a dummy template that I iterate n times. After hours of troubleshooting, I have noticed that:
- It always works fine (with licence and without licence) if the number of nodes is less than 2000
- If the number of nodes is greater than 2000, it works if licence is not present and fails if the licence is present. I found out the instruction that causes Saxon to crash:
check if position() < last(), then ../item[position() + 1]/@id
If I comment that instruction and licence is present, it works. If no licence, that line is processed okay whatever the number of nodes.
The error "Cannot write more than one result document to the same URI" is impossible. I double checked the xml; there is no risk of collision. I use xsl:result-document/. Href use the position() of current node.
Sadly, using the flag -t does not output the file that is written (it was the case with the prior version 9.8). I noticed with the prior version that files were written file: item_1.html, item_3.html, item_2.html, etc. This is only possible with some parallel processing as my xml is purely sequential (from id 1 to id n) and I use position() to write file.
My questions are:
- Are you aware of the "bug"?
- To bypass that bug, how can I disable the parallel processing if the licence is active?
Saxon-EE 9.9.0.1J from Saxonica Java version 10.0.2 Mac OS High Sierra 10.13.6
Files
Updated by Michael Kay about 6 years ago
- Category set to Multithreading
- Assignee set to Michael Kay
Thanks for reporting it. This looks like a nasty multithreading issue. Please could you let us have a test case that demonstrates the problem?
Use of last() within multithreaded code requires some devious logic and the problem is almost certainly in that area.
Running without a license effectively disables multithreading on xsl:result-document. You can achieve the same effect by setting the attribute saxon:asynchronous="no" on the xsl:result-document instruction, or by setting the global configuration property Feature.ALLOW_MULTITHREADING to false.
I think the failure to output the filename when running with -t is an oversight. (It's typically the kind of thing that's quite hard to capture with unit tests, but I'll see if there's a way....)
Updated by roland civet about 6 years ago
Hi Michael,
You'll find in attachment both the stylesheet and the xml source. The command line I use is:
java -jar saxon9ee.jar -t -opt:0 -s:items.xml -xsl:ss5.xsl -o:output/index.html
The instruction that is causing the issue is at line 59. If you comment it, it will work fine:
<xsl:if test="position() < last()">
<li><a href="item_{../item[position() + 1]/@id}.html">next</a></li>
</xsl:if>
My purpose if to generate a static website in 2 languages featuring a navigation previous/next.
I have cleaned up the stylesheet to keep only the relevant sections. To reproduce the issue, I generated another xml with 20000 . I has crashed 9 times out of 10. The xml has a size of 5 mg.
Hope this helps.
Cheers,
Roland
Updated by roland civet about 6 years ago
Updated by Michael Kay about 6 years ago
My first attempts to run it, everything works fine.
I've reinstated the code that outputs the message for each file written when -t is set.
Now I've set -opt:0, and I get the error java.lang.IllegalArgumentException: Attempting to treat a parent pointer as a node
And after a couple more runs that don't fail, I get
Cannot write more than one result document to the same URI: file:/Users/mike/bugs/2018/civet3927/en/item_125.html
The curious thing here is the sequence of trace messages. It starts:
Writing to file:/Users/mike/bugs/2018/civet3927/en/index.html
Writing to file:/Users/mike/bugs/2018/civet3927/fr/index.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_1.html
Writing to file:/Users/mike/bugs/2018/civet3927/fr/item_1.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_2.html
Writing to file:/Users/mike/bugs/2018/civet3927/fr/item_2.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_3.html
Writing to file:/Users/mike/bugs/2018/civet3927/fr/item_3.html
Writing to file:/Users/mike/bugs/2018/civet3927/fr/item_4.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_4.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_5.html
Writing to file:/Users/mike/bugs/2018/civet3927/fr/item_5.html
Writing to file:/Users/mike/bugs/2018/civet3927/fr/item_6.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_6.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_7.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_8.html
Writing to file:/Users/mike/bugs/2018/civet3927/fr/item_7.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_9.html
Writing to file:/Users/mike/bugs/2018/civet3927/fr/item_8.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_10.html
Writing to file:/Users/mike/bugs/2018/civet3927/fr/item_9.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_11.html
Writing to file:/Users/mike/bugs/2018/civet3927/fr/item_10.html
Writing to file:/Users/mike/bugs/2018/civet3927/fr/item_12.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_12.html
Writing to file:/Users/mike/bugs/2018/civet3927/fr/item_160.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_125.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_260.html
Writing to file:/Users/mike/bugs/2018/civet3927/fr/item_304.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_399.html
and then produces lots of apparently random (but generally increasing) item numbers, before resuming with the orderly sequence
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_13.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_19628.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_14.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_15.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_16.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_17.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_18.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_19.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_20.html
and then proceeding sequentially until it hits one of the "random" numbers already processed, in this case 125, at which point it fails with the duplicate URI error.
Updated by Michael Kay about 6 years ago
In another run I'm now seeing
java.lang.ArrayIndexOutOfBoundsException: -1
at net.sf.saxon.tree.tiny.NamedChildIterator.next(NamedChildIterator.java:68)
at net.sf.saxon.tree.tiny.NamedChildIterator.next(NamedChildIterator.java:23)
at net.sf.saxon.om.FocusTrackingIterator.next(FocusTrackingIterator.java:69)
at net.sf.saxon.trans.Mode.applyTemplates(Mode.java:421)
and it feels to me as if we've got a NamedChildIterator
that's being accessed from more than one thread.
Adding assert Thread.currentThread() == owner
into NamedChildIterator.next()
, where owner is initialized to the creating thread, failed to confirm this.
I tried adding saxon:asynchronous="no"
to the outer xsl:result-document elements. This triggered bug #3931. After fixing this, the test runs successfully, but it doesn't appear to be doing multithreading on the inner xsl:result-document calls (all the output files are written sequentially).
(Note, the stylesheet as supplied bound the prefix "saxon" to the old Saxon 6.5 namespace, I changed this).
I then commented out the generation of files for French language so the outer xsl:result-document only executes once. This ran to completion without error, but generated very few result documents: the sequence was like this:
Writing to file:/Users/mike/bugs/2018/civet3927/en/index.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_1.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_2.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_3.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_4.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_5.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_6.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_7.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_8.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_9.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_10.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_11.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_12.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_13.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_14.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_15.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_16.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_17.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_18.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_645.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_5435.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_19.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_6182.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_6876.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_7599.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_8467.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_8774.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_9977.html
Writing to file:/Users/mike/bugs/2018/civet3927/en/item_10720.html
It would seem to be worth studying this simplified problem.
Updated by roland civet about 6 years ago
I ran the stylesheet against v9.8 and v9.9. Both threw an exception, though the error messages/stack traces were different (see attachment "errors.txt")
I upgraded Java from 10.0.2 to version 11. No change.
To bypass the problem, I use count(../item) instead of last(). It does work on v9.8 and v9.9 (I repeated the operations 5-6 times).
I'm still puzzled by the flag -t. I have double checked carefully each of the command line. Only v9.8 returns outputs files URI being written.
When using v9.8, I noticed the parallel output for EN and FR. Keeping the processing instruction for 1 language only still produced the problem if the number of nodes to be treated is significant. When the number of item is < 100, it always works whatever the version, 1 or 2 languages.
Updated by Michael Kay about 6 years ago
Thanks. That new information is all consistent with the theory I am investigating. It's definitely a multithreading bug in Saxon so (a) the Java version isn't going to make any difference, (b) the observed effect is going to vary from one run to another, (c) the more records you process, the higher the probability of a failure.
The -t issue is recognized as a separate 9.9 issue.
Updated by Michael Kay about 6 years ago
I'm currently probing how last() gets evaluated. It calls XPathContextMinor.getLength(), and I see trace lines like this:
getLength() on net.sf.saxon.om.FocusTrackingIterator@5bf588ce in pool-1-thread-2 last = 9131
getLength() on net.sf.saxon.om.FocusTrackingIterator@5bf588ce in pool-1-thread-4 last = 9131
which show that the method is called on the same FocusTrackingIterator object in two different threads; since the method isn't synchronized, and updates mutable state, this looks like being the problem. (Generally in my experience, once we've found code being executed that's not thread-safe, there's very little point in investigating why it causes the observed symptoms; once data structures become incoherent, anything can happen.)
The method used in xsl:result-document to get the value of last() is synchronized, but it is synchronized on a an object that isn't shared between the threads, so the synchronization is useless. I'm experimenting with synchronizing XPathContextMinor.getLast().
I've been trying synchronizing various points in this logic on various objects and so far I haven't found one that solves the problem.
Updated by Michael Kay about 6 years ago
I think the problem isn't just that the last()
logic synchronizing its own internal logic; the problem is that the call on last() is consuming an iterator that another thread is reading item-by-item, and as a result the other thread makes wild jumps forwards.
Synchronizing both FocusTrackingIterator.getLast()
and FocusTrackingiterator.next()
solves the problem. We should probably also synchronize other methods that access the base iterator, such as hasNext()
and materialize()
.
How expensive is this? FocusTrackingIterator.next()
is a very widely used method, and although it's said that synchronizing these days is very cheap if there is no actual contention, we need to check for any performance regression.
But is there any alternative? Unfortunately we don't know when we create the iterator that it's going to be accessed in multiple threads (it will typically be iterating over the items selected by xsl:apply-templates, and we have no idea how they will be processed.) Nor do we know that the iterator is going to have last() called on it.
Perhaps there are cases where we know that result-document is NOT going to be called on the iterator, and/or that last() is not going to be called - for example (a) if the stylesheet doesn't use xsl:result-document at all, or (b) if we're in an xsl:for-each where the action part makes no template calls that expose the focus to another component (and makes no result-document calls, of course). Whether it's worth detecting these cases in order to avoid the tiny overhead of synchronizing the next() call is an open question.
Updated by Michael Kay about 6 years ago
- Subject changed from Okay if no licence - Error if licence active - Cannot write more than one result document to the same URI to Multi-threading problems using Saxon-EE xsl:result-document with calls on last()
- Priority changed from High to Normal
- Applies to branch 9.8 added
Updated by roland civet about 6 years ago
Is xsl:iterate another instruction that internally call last()/FocusTrackingIterator.getLast()?
Updated by Michael Kay about 6 years ago
- Status changed from New to Resolved
- Fix Committed on Branch 9.8, 9.9 added
I have fixed the problem as follows:
-
In 9.8, I have simply marked all the methods on FocusTrackingIterator as synchronized.
-
In 9.9 I have introduced a subclass, MultithreadedFocusTrackingIterator, in which the methods are synchronized. As a consequence of bug #3636, the FocusTrackingIterator for 9.9 is already created using a factory function, and we now allocate a different factory function (to use the thread-safe version) if we are in Saxon-EE, running a stylesheet that uses xsl:result-document, with allow-multithreading enabled in the configuration. This avoid the overhead of synchronization except in the cases where it is needed.
The test case has been productised in XSLT30Extra as test threads-026. At the same time this tests the saxon:asynchronous extension attribute.
Updated by Michael Kay about 6 years ago
Some further observations:
-
The test stylesheet used the old Saxon namesapce, this has been corrected.
-
The logic to access theprevious and next items was incorrect, it now reads:
<ul class="nav">
<xsl:variable name="pos" select="position()"/>
<xsl:if test="$pos > 1">
<li><a href="item_{../item[$pos - 1]/@id}.html">prev</a></li>
</xsl:if>
<xsl:if test="$pos < last()">
<li><a href="item_{../item[$pos + 1]/@id}.html">next</a></li>
</xsl:if>
</ul>
In response to the question in comment #11, yes, xsl:iterate, xsl;for-each, and any other instruction that advances the focus (e.g. xsl:analyze-string, xsl:for-each-group) will use this mechanism. The significant thing about this bug, though, is the use of last() in conjunction with xsl:result-document, because xsl:result-document is the only standard instruction that causes asynchronous execution of XSLT instructions. The other case I looked at was xsl:for-each with threads=N, and I think this should be OK because it uses the same logic, but I haven't written an explicit test.
Updated by O'Neil Delpratt about 6 years ago
- % Done changed from 0 to 100
- Fixed in Maintenance Release 9.8.0.15 added
Bug fix applied in the Saxon 9.8.0.15 maintenance release. Leave open to the Saxon 9.9 maintenance release.
Updated by O'Neil Delpratt about 6 years ago
- Status changed from Resolved to Closed
- Fixed in Maintenance Release 9.9.0.2 added
Bug fix applied in the Saxon 9.9.0.2 maintenance release.
Please register to edit this issue