Project

Profile

Help

Bug #4502

StackOverflowError on Saxon 10.0

Added by Gunther Rademacher 4 months ago. Updated 3 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Performance
Sprint/Milestone:
-
Start date:
2020-03-26
Due date:
% Done:

100%

Estimated time:
Legacy ID:
Applies to branch:
10, 9.9
Fix Committed on Branch:
10, 9.9
Fixed in Maintenance Release:

Description

The attached ast-to-ebnf.xq is a piece of code from REx grammar tools. It is used when converting a grammar from XML representation back to textual EBNF for different contexts.

Since Saxon 10.0, this code results in a StackOverflowError as shown in attachment stacktrace.log. As far as I can see, this is not related to problems with tail call optimization. The -explain output shows that function b:break-lines has been recognized as tail-recursive. However the stack trace suggests that subList chains may be causing the problem.

It can be reproduced using this command line (not always, but most of the time):

java net.sf.saxon.Query ast-to-ebnf.xq -s:rendered-items.xml 2>stacktrace.log
stacktrace.log (57.1 KB) stacktrace.log Gunther Rademacher, 2020-03-26 20:41
ast-to-ebnf.xq (5.02 KB) ast-to-ebnf.xq Gunther Rademacher, 2020-03-26 20:41
rendered-items.xml (74.1 KB) rendered-items.xml Gunther Rademacher, 2020-03-26 20:41

History

#1 Updated by Michael Kay 4 months ago

A tough one to debug because the stack trace isn't telling us where we are in the Saxon code (or in the user query for that matter).

I got a somewhat different stack trace running with -T option, the relevant part reads

	at java.util.SubList.get(AbstractList.java:641)
	at java.util.SubList.get(AbstractList.java:641)
	at java.util.SubList.get(AbstractList.java:641)
	at java.util.SubList.get(AbstractList.java:641)
	at java.util.SubList.get(AbstractList.java:641)
	at net.sf.saxon.tree.iter.ListIterator.next(ListIterator.java:51)
	at EE_trace_22054798982.process(file:/Users/mike/bugs/2020/4502-Rademacher/ast-to-ebnf.xq:58)
	at com.saxonica.ee.bytecode.CompiledExpression.process(CompiledExpression.java:85)
	at com.saxonica.ee.bytecode.ByteCodeCandidate.process(ByteCodeCandidate.java:146)
	at net.sf.saxon.expr.instruct.UserFunction.process(UserFunction.java:666)
	at EE_trace_22054798982.process(file:/Users/mike/bugs/2020/4502-Rademacher/ast-to-ebnf.xq:58)
	at com.saxonica.ee.bytecode.CompiledExpression.process(CompiledExpression.java:85)
	at com.saxonica.ee.bytecode.ByteCodeCandidate.process(ByteCodeCandidate.java:146)
	at net.sf.saxon.expr.instruct.UserFunction.process(UserFunction.java:666)
	at EE_trace_22054798982.process(file:/Users/mike/bugs/2020/4502-Rademacher/ast-to-ebnf.xq:58)
	at com.saxonica.ee.bytecode.CompiledExpression.process(CompiledExpression.java:85)
	at com.saxonica.ee.bytecode.ByteCodeCandidate.process(ByteCodeCandidate.java:146)
	at net.sf.saxon.expr.instruct.UserFunction.process(UserFunction.java:666)

which seems to have deep recursion both within the Java SubList,get() and within the user query itself -- specifically the b:break-lines() function, which certainly doesn't look tail-recursive from this evidence. But that may be because -T suppresses tail call optimization.

With bytecode disabled (but still with -T on) the middle of the stack trace looks like this:

        at java.util.SubList.get(AbstractList.java:641)
	at java.util.SubList.get(AbstractList.java:641)
	at java.util.SubList.get(AbstractList.java:641)
	at java.util.SubList.get(AbstractList.java:641)
	at java.util.SubList.get(AbstractList.java:641)
	at java.util.SubList.get(AbstractList.java:641)
	at java.util.SubList.get(AbstractList.java:641)
	at net.sf.saxon.tree.iter.ListIterator.next(ListIterator.java:51)
	at net.sf.saxon.expr.FirstItemExpression.evaluateItem(FirstItemExpression.java:113)
	at net.sf.saxon.expr.parser.Evaluator$6.evaluate(Evaluator.java:173)
	at net.sf.saxon.expr.LetExpression.eval(LetExpression.java:537)
	at net.sf.saxon.expr.LetExpression.processLeavingTail(LetExpression.java:720)
	at net.sf.saxon.expr.instruct.Choose.processLeavingTail(Choose.java:908)
	at net.sf.saxon.expr.instruct.Instruction.process(Instruction.java:136)
	at net.sf.saxon.expr.instruct.ComponentTracer.processLeavingTail(ComponentTracer.java:150)
	at net.sf.saxon.expr.instruct.Instruction.process(Instruction.java:136)
	at net.sf.saxon.expr.instruct.UserFunction.process(UserFunction.java:666)

It appears to be evaluating $todo[1] at line 67. The value of $todo is a SequenceExtent wrapping a Java RandomAccessSubList, containing around 4400 element and text nodes. In the debugger we can see a sequence of calls of ListIterator.next(), with the length of list reducing by one on each call.

So I guess we need to look at how let $todo := $todo[position() > 1] is working. It gets compiled into a TailExpression, and at runtime it's invoking

    public SequenceExtent(SequenceExtent ext, int start, int length) {
        value = ext.value.subList(start, start+length);
    }

which, it appears is generating a SubList of a SubList of a Sublist etc, 4400 levels deep.

None of the code here is obviously new or changed in 10.0. Presumably there's some change that's causing this particular use cae to take this path when it didn't before.

#2 Updated by Michael Kay 4 months ago

I can see why it's failing; it's creating a deeply nested set of SubLists, and searching for the first item involves recursing down all these Sublists to find the first item in the one that's most deeply nested.

What I can't see is why it's not failing in exactly the same way in 9.9, The code seems to be following exactly the same paths, and there seem to be no signification changes in this area.

The strategy of creating a SubList to implement tail expressions seems flawed.

We only use this mechanism when using the LAZY_TAIL evaluator, which is only used when a TailExpression is applied to a local variable (suggesting likely use of head-tail recursion). There's a comment in the code to the effect that we're doing this "to avoid creating a deeply-nested tree of Closures" - so we're just creating a deeply-nested tree of SubLists instead.

Back in Saxon 9.5, we didn't use SubList in this way: rather, SequenceExtent kept a start/end position and applied these as offsets into the underlying list. This seems a better mechanism: though I would be inclined to use a variant of SequenceExtent rather than adding the complexity back into the main class.

Perhaps we should also be looking at other implementations of grounded sequences, such as Chain, which exists mainly to handle incremental addition of new items to the end of a sequence (as opposed to the current case, which is concerned with incremental removal from the start). Perhaps both cases could be handled using the mechanism we use for implementing arrays, namely an immutable list implemented as a binary tree, which provides reasonable efficiency both for sequential traversal and direct access, while also allowing efficient immutable append, insert, and remove operations? However, I'm concerned this would slow down the most common operations on sequences for the benefit of less common operations. Furthermore, it would use a lot more memory than the current SequenceExtent.

#3 Updated by Michael Kay 4 months ago

  • Category set to Performance
  • Status changed from New to In Progress
  • Assignee set to Michael Kay
  • Priority changed from Low to Normal

I've been looking at other places where we use List.subList() to see if they might have similar problems.

There are a few areas that are certainly worth looking at. These include:

  • SimpleArrayItem.subArray()
  • ImmList.fromList()
  • AtomicArray.subsequence()
  • Chain.subsequence()
  • MemoSequence - makeExtent(), getResidue()
  • ZeroOrMore.subsequence()
  • ArrayIterator - materialize(), getResidue()
  • ListIterator.getResidue()

#4 Updated by Michael Kay 4 months ago

I have introduced a new implementation of grounded sequences called SequenceSlice that maps to a slice of an underlying List. SequenceExtent.subsequence() and SequenceSlice.subsequence() both now return a SequenceSlice.

Since this should give a significant performance boost for applications following a similar head/tail recursion pattern, and since the danger of a stack overflow seems to be as much present in 9.9 as in 10.0, I shall retrofit this change to 9.9 as well.

#5 Updated by Michael Kay 4 months ago

The change has been retrofitted to 9.9 and appears to be working well. However, the bytecode generation needs to make the same change. We need to replace the call on new SequenceExtent(sequence, start, length) with a call on sequence.subsequence(start, length).

#6 Updated by Michael Kay 4 months ago

  • Status changed from In Progress to Resolved
  • Applies to branch 9.9 added
  • Fix Committed on Branch 10, 9.9 added

Bytecode generation now updated; marking as resolved.

#7 Updated by Gunther Rademacher 4 months ago

Thank you for fixing this!

I would like to test the fix on Saxon 10.0, but I can't seem to find it. I was expecting the fix to appear in https://dev.saxonica.com/repos/archive/opensource/latest10 (latest revision is 3460), but it doesn't. Bug #4508 is also marked as "Fix committed on branch 10", but I don't see that fix either.

The 9.9 branch however shows a fix for #4502, revision 3461 in https://dev.saxonica.com/repos/archive/opensource/latest9.9

#8 Updated by Michael Kay 4 months ago

Thanks for pointing that out. Seems I'm committing to our internal development repository and not to the public one, and the changes need to be migrated across,

#9 Updated by O'Neil Delpratt 3 months ago

  • % Done changed from 0 to 100
  • Fixed in Maintenance Release 10.1 added

Bug fix committed in the Saxon 10.1 maintenance release.

Please register to edit this issue

Also available in: Atom PDF