Project

Profile

Help

Bug #2546

closed

XQuery using saxon:stream not streaming in 9.7

Added by Gunther Rademacher over 8 years ago. Updated over 8 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
Streaming
Sprint/Milestone:
-
Start date:
2015-12-17
Due date:
% Done:

100%

Estimated time:
Legacy ID:
Applies to branch:
9.7
Fix Committed on Branch:
9.7
Fixed in Maintenance Release:
Platforms:

Description

While testing saxon:stream from XQuery on 9.7.0.1, I found that these queries failed to stream their result:

saxon:stream(doc('uriresolver:resolve')/*/*)/*
for $x in saxon:stream(doc('uriresolver:resolve')/*/*)/* return $x

Trying them on the command line using "-explain" says

OPT : Using streaming copy

in either case, but also shows a docOrder operator applied, which presumably causes the problem.

We had been discussing this last year (http://markmail.org/message/n4iccqoank2oub4c), and it eventually got in favour of streaming in 9.5.1.5, on behalf of #2013.

Also attaching my test. It uses an endless input stream, thus cannot succeed without streaming.


Files

StreamingXQuery.java (4.92 KB) StreamingXQuery.java Gunther Rademacher, 2015-12-17 09:45
Actions #1

Updated by O'Neil Delpratt over 8 years ago

  • Found in version deleted (9.7)
  • Applies to branch 9.7 added
Actions #2

Updated by Michael Kay over 8 years ago

  • Category set to Streaming
  • Status changed from New to In Progress
  • Priority changed from Low to Normal

We have this as JUnit test TestXQueryStreaming/test6. We analyzed the failure during Saxon 9.7 prerelease testing, and decided rather reluctantly that the query wasn't streamable. The reasoning is as follows. saxon:stream() is defined to generate a sequence of snapshots of the selected nodes (snapshots in the sense of the fn:snapshot() function defined in XSLT 3.0). Since each snapshot is a separate tree, document order among these trees is implementation-dependent; there is no guarantee that the document-order of $X!snapshot(.) corresponds with the document order of $X. Because Saxon doesn't know that the result of saxon:stream() is in document order, it puts a sort into the expression tree, which means that although the source document is streamed, the sort operation tries to read the document to completion before it starts sorting, and because the document is infinite, this never terminates.

An additional complication is that no streamability violation is detected statically. That's basically because operations on grounded sequences (sequences consisting of ordinary non-streamed nodes) are outside the scope of the streamability analysis; and the fact that saxon:stream() generates a sequence of snapshots (which are considered grounded) places it in this territory.

Clearly this is rather unsatisfactory, and we should try to do better.

The fix when this came up before in bug #2013 (thanks for the reference) was to mark saxon:stream() as returning a peer node-set. That's a bit more difficult now that we compile saxon:stream() into a call on xsl:stream (which is allowed to return nodes that aren't peers). But it can probably be done.

Actions #3

Updated by Michael Kay over 8 years ago

Hmmm... Marking it as a peer node-set isn't enough to prevent a sort. If P is a peer node-set, then P/* is still going to require a sort. We have to mark it as delivering nodes in document order; and for safety, that means that it actually has to deliver nodes in document order.

The snapshots are (at least by default) created as TinyTree instances. Nodes A and B in two different tinytree instances have the document ordering A<<B iff documentNumber(root(A)) < documentNumber(root(B)). Document numbers are allocated sequentially. So I think we can be confident that the snapshots ARE in document order; the only issue is making the compiler/optimizer aware that this is the case.

Actions #4

Updated by Michael Kay over 8 years ago

  • Status changed from In Progress to Resolved

OK, I've got this one working. Most of the pieces were already in place. The missing part was an inference that given the expression

A ! B

then if A is a singleton, the expression A ! B has most of the properties of B: in particular, if B is in document order, then A ! B is in document order. The role this plays in the overall scheme of things is quite complicated: in essence, saxon:stream(document('x')/a) is being rewritten as ('x' ! xsl:stream(.)) because document() maps over its operand.

Actions #5

Updated by Michael Kay over 8 years ago

  • Fix Committed on Branch 9.7 added
Actions #6

Updated by O'Neil Delpratt over 8 years ago

  • % Done changed from 0 to 100
  • Fixed in Maintenance Release 9.7.0.2 added

Bug fix applied in the Saxon 9.7.0.2 maintenance release

Actions #7

Updated by O'Neil Delpratt over 8 years ago

  • Status changed from Resolved to Closed

Please register to edit this issue

Also available in: Atom PDF