Project

Profile

Help

Bug #4656

Optimization causing SXLM0001 or java.lang.StackOverflow

Added by Michael Kay about 1 year ago. Updated about 1 year ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
-
Sprint/Milestone:
-
Start date:
2020-07-24
Due date:
% Done:

100%

Estimated time:
Legacy ID:
Applies to branch:
10, trunk
Fix Committed on Branch:
10, trunk
Fixed in Maintenance Release:
Platforms:

Description

Repro files sent by email

History

#2 Updated by Michael Kay about 1 year ago

Problem reproduced (Stack Overflow); and it goes away when run with -opt:0. (Execution time 1 m 12s)

#3 Updated by Michael Kay about 1 year ago

-explain output is as follows:

OPT : At line 100 of file:/Users/mike/bugs/2020/4656-....prune_targetset.xsl
OPT : Replaced general comparison by value comparison
OPT : Expression after rewrite: (normalize-space(convertUntyped(data(.)))) eq ""
OPT : At line 37 of file:/Users/mike/bugs/2020/4656-.....prune_targetset.xsl
OPT : Replaced general comparison by value comparison
OPT : Expression after rewrite: (Q{http://www.w3.org/2001/XMLSchema}string(data(attribute::attribute(Q{http://www.w3.org/1999/xlink}role)))) eq "http://docbook.org/xlink/role/olink"
OPT : At line 37 of file:/Users/mike/bugs/2020/4656-.....prune_targetset.xsl
OPT : Replaced general comparison by value comparison
OPT : Expression after rewrite: (Q{http://www.w3.org/2001/XMLSchema}string(data(attribute::attribute(Q{http://www.w3.org/1999/xlink}href)))) ne ""
OPT : At line 44 of file:/Users/mike/bugs/2020/4656-.....prune_targetset.xsl
OPT : Replaced general comparison by value comparison
OPT : Expression after rewrite: (Q{http://www.w3.org/2001/XMLSchema}string(data(attribute::attribute(Q{}srcProductName)))) eq $thisproductname
OPT : At line 45 of file:/Users/mike/bugs/2020/4656-.....prune_targetset.xsl
OPT : Replaced general comparison by value comparison
OPT : Expression after rewrite: (Q{http://www.w3.org/2001/XMLSchema}string(data(attribute::attribute(Q{}element)))) eq "refentry"
OPT : At line 53 of file:/Users/mike/bugs/2020/4656-.....prune_targetset.xsl
OPT : Replaced general comparison by value comparison
OPT : Expression after rewrite: (Q{http://www.w3.org/2001/XMLSchema}string(data(attribute::attribute(Q{}srcProductName)))) eq $thisproductname
OPT : At line 77 of file:/Users/mike/bugs/2020/4656-.....prune_targetset.xsl
OPT : Replaced general comparison by value comparison
OPT : Expression after rewrite: (Q{http://www.w3.org/2001/XMLSchema}string(data(attribute::attribute(Q{}targetdoc)))) eq $thisbiblioid

So the optimizer doesn't seem to be doing anything very fundamental (though not all optimization rewrites are logged)

#4 Updated by Michael Kay about 1 year ago

Running with -opt:-j option causes a crash in bytecode generation:

xception in thread "main" java.lang.AssertionError: java.lang.NoSuchMethodException: net.sf.saxon.trans.Mode.applyTemplates(net.sf.saxon.expr.instruct.ParameterSet, net.sf.saxon.expr.instruct.ParameterSet, net.sf.saxon.event.Outputter, net.sf.saxon.expr.XPathContextMajor, net.sf.saxon.s9api.Location)
	at com.saxonica.ee.bytecode.util.Generator.invokeInstanceMethod(Generator.java:299)
	at com.saxonica.ee.bytecode.ApplyTemplatesCompiler.compileToPush(ApplyTemplatesCompiler.java:104)
	at com.saxonica.ee.bytecode.util.CompilerService.compileToPush(CompilerService.java:758)
	at com.saxonica.ee.bytecode.CopyCompiler.compileCopyElement(CopyCompiler.java:422)
	at com.saxonica.ee.bytecode.CopyCompiler.access$100(CopyCompiler.java:32)
	at com.saxonica.ee.bytecode.CopyCompiler$1.generateCase(CopyCompiler.java:110)


Don't know if this is related. It's not just a simple bytecode problem, because with -opt:-c it still goes into StackOverflow.

#5 Updated by Michael Kay about 1 year ago

The bytecode crash is because of a local change I made for an unrelated development which hadn't been fully tested. I've fixed this and it now runs to completion with the opt:-j option. So it could be a problem with JIT template compilation.

#6 Updated by Michael Kay about 1 year ago

Patched the code to produce a Java stack trace at the point of failure, and we're seeing this:

	at net.sf.saxon.expr.UnionEnumeration.next(UnionEnumeration.java:22)
	at net.sf.saxon.expr.UnionEnumeration.next(UnionEnumeration.java:61)
	at net.sf.saxon.expr.UnionEnumeration.next(UnionEnumeration.java:79)
	at net.sf.saxon.expr.UnionEnumeration.next(UnionEnumeration.java:22)
	at net.sf.saxon.expr.UnionEnumeration.next(UnionEnumeration.java:61)
	at net.sf.saxon.expr.UnionEnumeration.next(UnionEnumeration.java:79)
	at net.sf.saxon.expr.UnionEnumeration.next(UnionEnumeration.java:22)
	at net.sf.saxon.expr.UnionEnumeration.next(UnionEnumeration.java:61)
	at net.sf.saxon.expr.UnionEnumeration.next(UnionEnumeration.java:79)

and so on. This is run-time evaluation of a union expression; the only reason I can think that it would recurse deeply is if there's a union expression with a very large number of branches.

Going into this with the debugger, the union iterator doesn't actually derive from a union expression, it derives from a call on key() - it looks like a system-generated key. It seems we've generated a call such as key('kk101', $search), and $search is a large set of atomic values, and we're combining the results of these separate searches using an implicit union expression. If this union expression is of the form A|B|C|D|E... then it's going to be evaluated as a nested set of binary union operations: ((((((A|B)|C)|D)|E).... and this accounts for the deep nesting.

#7 Updated by Michael Kay about 1 year ago

We seem to be evaluating the global variable at line 22, which includes the expression

descendant::db:*[@xlink:role='http://docbook.org/xlink/role/olink']

and I guess that a very large number of nodes satisfy this filter expression.

I'm surprised to see no mention in the -explain output of an index being generated here, perhaps we're not doing optimizer tracing for global variables?

I can certainly see that it we have a filter expression like this that looks highly selective but actually isn't, then (a) indexing it is pretty pointless, and (b) the strategy for combining results looks flawed.

#8 Updated by Michael Kay about 1 year ago

  • Status changed from New to In Progress

The rationale for the code here is at https://saxonica.plan.io/issues/2929#note-19

Incidentally, I note that for the union expression on line 105

select="$thisnode/* | $thisnode/@* | $thisnode/text() | $thisnode/processing-instruction() | $thisnode/comment()"

we're also constructing a tree of union enumerations. The ideal strategy here would be to turn this into

$thisnode/@*, $thisnode/child::node()

but the optimizer isn't quite smart enough for that. If it weren't for the attribute selector, I think we would manage

$thisnode/@*, $thisnode/child::node()[self::* or self::text() or self::comment() or self::processing-instruction()]

which would certainly be an improvement.

#9 Updated by Michael Kay about 1 year ago

To put the problem in context, we're evaluating

$full_targetset/descendant::tdb:div[@href = $direct_targets_in_use]

which we have (very reasonably) turned into a key() call, and the value of $direct_targets_in_use is a sequence of 5803 strings. Each of these 5803 input strings causes an index lookup, which typically retrieves two nodes. So our task is to combine 5803 lists of nodes, each of which is typically a list of 2 nodes in document order, into a single list in document order. We're doing this as a merge operation because in the general case that's going to be better than concatenating the results and sorting.

One solution would simply be to try and make the binary tree of iterators balanced, so the maximum depth of recursive call is less extreme.

#10 Updated by Michael Kay about 1 year ago

I decided to change the design so that instead of creating a binary tree of UnionEnumerations each of which takes nodes from two input iterators, we now have a UnionIterator class that takes an arbitrary number of input iterators. It maintains a sorted set of "intakes", each comprising an iterator and the next node delivered by that iterator, implemented using a Java TreeMap. On each request for a node from the UnionIterator, we take the next node from the first intake in this sorted set, and then replenish that intake by reading nodes from the input iterator until we find one that is not already in the set.

This appears to work for the problem stylesheet, but it's new and rather subtle code, so it needs some new test cases.

#11 Updated by Michael Kay about 1 year ago

  • Status changed from In Progress to Resolved
  • Applies to branch 10, trunk added
  • Fix Committed on Branch 10, trunk added

Marking this as resolved.

The problem is caused by unusual patterns in a filter expression leading to a large stack of UnionEnumeration objects being created.

The fix is to introduce a new UnionIterator that can handle an arbitrary number of input streams, replacing the current class that can only handle two.

On the development branch this new UnionIterator class has been deployed for all union expressions; on the 10.x branch it is used only for this particular usage with (generated) key indexes.

It is possible to work around the problem by suppressing the generation of the index for this filter expression, which can be done using the optimization option -opt:-k.

#12 Updated by Vladimir Nesterovsky about 1 year ago

You say you collect final result for a sorted sequence into a TreeMap but this in general O(NLogN) operation, and is often slower than collecting all results in array and then sorting it (which is also O(NLogN)).

I think you might be interested in implementing a variant of merging of sorted arrays, see https://en.wikipedia.org/wiki/K-way_merge_algorithm

#13 Updated by Michael Kay about 1 year ago

Thanks for the comment, Vladimir, but I think this is the right strategy for this case.

What we're doing here is to merge N sequences that are known to be sorted, delivering the result of the merge as an iterator. The result iterator in its next() method needs to choose an item from one of the input iterators, namely the one that is first in document order. By maintaining a TreeMap holding the next entry for each input iterator, we need memory proportional to the number of input streams. We choose an item using a pollFirst() operation which selects it and removes it from the map; we then read off the next item from the corresponding iterator and add it to the map, where it's inserted in its proper place in sorted order. The space cost of this algorithm is O(S) where S is the number of input streams, and the time cost is O(N log S) where N is the total number of items in all the streams. Given that S is usually very small (like 2 or 3) and this case (with 5803 input streams) is definitely exceptional, I think this is the right approach.

#14 Updated by Michael Kay about 1 year ago

Having said all that, given this expression

$full_targetset/descendant::tdb:div[@href = $direct_targets_in_use]

with 5000 items in $direct_targets_in_use, the right strategy is probably to create a hash map containing the 5000 items, and then to do a single scan of the descendant axis checking each item against this map. I don't think there's any chance of getting Saxon to identify that as the right strategy, but it's easy enough to implement it at user level by putting the 5000 items in a map rather than a sequence.

#15 Updated by O'Neil Delpratt about 1 year ago

  • % Done changed from 0 to 100
  • Fixed in Maintenance Release 10.2 added

Bug fix applied in the Saxon 10.2 maintenance release.

#16 Updated by O'Neil Delpratt about 1 year ago

  • Status changed from Resolved to Closed

Please register to edit this issue

Also available in: Atom PDF