Project

Profile

Help

Bug #5488

open

Streaming with the union, intersect, and except operators

Added by Michael Kay 4 days ago. Updated about 11 hours ago.

Status:
New
Priority:
Low
Assignee:
Category:
Streaming
Sprint/Milestone:
-
Start date:
2022-05-12
Due date:
% Done:

0%

Estimated time:
Legacy ID:
Applies to branch:
Fix Committed on Branch:
Fixed in Maintenance Release:
Platforms:

Description

The vast majority of the tests for streaming with union, intersect, and except operators provide one streamed and one non-streamed operand. This case isn't very interesting; in particular, with intersect and except the result will always be empty, and the only interest is in whether and when we detect this.

The specification, however, describes some much more challenging cases. We should be able to process a union, intersect, or except operator with two consuming operands, for example //book | //magazine or //book[@price gt 10] intersect subsequence(//book, 1, 10). The assumption, explained in a note in the spec, is that the two operands can be evaluated "in parallel" during a single pass of the input, and the results combined on the fly. We make no attempt to implement this, as far as I am aware.

Cases involving climbing operands are also interesting: the spec gives the example parent::A | */ancestor::B

Actions #1

Updated by Michael Kay 4 days ago

  • Description updated (diff)
Actions #2

Updated by Michael Kay 4 days ago

It doesn't actually look very difficult to do this. We handle streaming expressions like //book and //magazine by treating them as patterns and matching them against nodes in the streamed document; we already have the capability to construct a VennPattern representing the union, intersection, or difference between two patterns.

Actions #3

Updated by Michael Kay 3 days ago

I've added tests sx-union-300 to -335 that test unions between (for the most part) two consuming operands, and a good number of them work as written. However, some are failing and need further examination to see what the spec says.

For example test -317 does

<xsl:value-of select="(/BOOKLIST/BOOKS/ITEM[1]/* union /BOOKLIST/BOOKS/ITEM[2]/*) ! contains(name(), 'E')"/>

and is failing

SXST0061  Navigation using child axis is not supported from a streamed input node

This failure seems spurious. The failure typically occurs because static analysis has concluded that the expression is streamable, but the run-time code fails to provide a streaming implementation. In particular, the union expression cannot be converted to a pattern because the filtered operands cannot be converted to patterns, which is ultimately because SubscriptExpressionAdjunct.toStreamingPattern() returns null (as does FirstItemExpressionAdjunct).

We implement streaming of an expression like /BOOKLIST/BOOKS/ITEM[1]/* not by generating a pattern that matches these nodes and nothing else, but by matching /BOOKLIST/BOOKS/ITEM nodes and counting them as they pass by. Union expressions seem to be streamable only if the operands can both be treated as patterns.

To make this test work, we need to be smarter than this. Essentially we need to treat it like an xsl:fork, where the two operands are evaluated independently (as separate Watches in the WatchManager) and the two feeds are then merged into one. This isn't a completely new concept, we do much the same thing for merging entries in a streamed map constructor, but it's certainly new code.

Actions #4

Updated by Michael Kay about 17 hours ago

I've made good progress in allowing a union of striding sequences like (A/B/C union A/B/D), by having a watch for each operand feeding into a CombiningUnionFeed that forms the union of the two sequences. This is straightforward enough because the nodes arrive in the right order, and if the same node arrives from both inputs, they will be adjacent which makes it easy to detect the duplicate.

I'm having more difficulty with sequences that use the ancestor axis, for example

ITEM[1]/ancestor::* | ITEM[2]/ancestor::*

where the nodes arrive in the CombiningUnionFeed in the order BOOKLIST BOOKS ITEM[1] BOOKLIST BOOKS ITEM[2], and the BOOKLIST and BOOKS elements need to be de-duplicated.

The answer may be for the CombiningUnionFeed to accumulate nodes so long as they derive from the same WatchManager event (we could detect that by maintaining an event counter in the WatchManager).

Actions #5

Updated by Michael Kay about 11 hours ago

Investigating sx-union-331 which does

select="(/BOOKLIST/BOOKS/ITEM[1]/PRICE/ancestor::* union /BOOKLIST/BOOKS/ITEM[2]/PRICE/ancestor::*) ! name()

there seems to be a problem with the streamed evaluation of the two operands. An expression of the form child::*/ancestor::* is handled by a DocumentSorterAdjunct which is supposed to sort the ancestors of a node before the node itself; but in fact it seems to be sorting the entire node sequence, meaning that it's not really streamed at all.

Considering the first expression on its own,

/BOOKLIST/BOOKS/ITEM[1]/PRICE/ancestor::*

it seems that when ITEM[1] is processed, a watch is created to catch PRICE, which fires and computes PRICE/ancestor::*, which sits around in the DocumentSorter until another event comes along. When ITEM[2] is processed, the watch correctly recognises that it doesn't need to do anything; it also recognises that there will be no further matches on future ITEMS; but it doesn't send any close() event up the pipeline until the end of the sequence of ITEMs is notified.

So in combining the two branches of the union, all the events for BOOKLIST BOOKS ITEM[1] BOOKLIST BOOKS ITEM[2] actually arrive when the endDocument() is processed.

Please register to edit this issue

Also available in: Atom PDF