Bug #2013
closedsaxon:stream() in XQuery not streaming when it should
100%
Description
Gunther Rademacher reports on saxon-help:
However I found that the extra FLWOR does make a difference. These queries are streaming:
for $x in saxon:stream(doc('uri')/*/*) return string($x)
for $x in saxon:stream(doc('uri')/*/*) return $x
for $x in saxon:stream(doc('uri')/*/*) return $x/*
but these are not:
saxon:stream(doc('uri')/*/*)/string()
saxon:stream(doc('uri')/*/*)
saxon:stream(doc('uri')/*/*)/*
To which I responded:
You might find it helpful to do this kind of investigation by running from the command line, for example:
java net.sf.saxon.Query -qs:"saxon:stream(doc('temp/test.xml')//)/string()" -explain -qversion:3.0
The -explain output in this case tells you:
OPT ======================================
OPT : Cannot use streaming copy: expression is not provably in document order
OPT ======================================
though it's a little bit confusing because some of the expressions that can be streamed give you the same message, followed by
OPT ======================================
OPT : Using streaming copy
OPT ======================================
because the optimizer succeeds on its second attempt. But it's a lot better than trying to work out what's streaming and what isn't by measuring the time and memory usage.
On 24 Feb 2014, at 10:56, Rademacher, Gunther Gunther.Rademacher@softwareag.com wrote:
Thanks for the advice. I now have streaming queries working as I want them.
However I found that the extra FLWOR does make a difference. These queries are streaming:
for $x in saxon:stream(doc('uri')/*/*) return string($x)
for $x in saxon:stream(doc('uri')/*/*) return $x
for $x in saxon:stream(doc('uri')/*/*) return $x/*
but these are not:
saxon:stream(doc('uri')/*/*)/string()
saxon:stream(doc('uri')/*/*)
saxon:stream(doc('uri')/*/*)/*
What's happening here is that in all 6 cases, the first attempt to convert to a streaming expression happens before the analysis that the expression doesn't need sorting, and therefore fails. In the first three cases there is a second attempt (because it tries again after inlining the variable), and the second attempt succeeds. But in the last three cases there is no second attempt. I will fix this.
Updated by Michael Kay over 10 years ago
- Status changed from New to Resolved
A patch is being applied to CopyOf.optimize() to fix two problems (a) the spurious message that says streaming failed, when it succeeds on the second attempt, and (b) ensuring that a second attempt is always made after optimizing the select expression (that is, the argument to saxon:stream).
In fact the rewrite is now attempted both before and after optimizing the select expression. I think it's too risky to change the logic so that only the second attempt is made; there aren't enough 9.5 test cases to ensure this approach causes no regression.
Updated by Michael Kay over 10 years ago
The expression saxon:stream(doc('uri')//)/* still runs out of memory, because although saxon:stream(doc('uri')//) delivers a stream of small subtrees as it should, the final "/*" is causing these to be sorted into document order, which means the many small subtrees all have to live in memory at the same time.
The can be avoided by setting the bit to indicate that saxon:stream() returns a "peer node-set", that is, a sequence of nodes in which no node is an ancestor of any other. This is sufficient to enable the expression saxon:stream(X)/* to be recognized as being "naturally sorted". Another patch is being committed.
Updated by O'Neil Delpratt over 10 years ago
- Status changed from Resolved to Closed
- % Done changed from 0 to 100
- Fixed in version set to 9.5.1.5
Bug fix applied in Saxon maintenance release 9.5.1.5
Please register to edit this issue