Project

Profile

Help

Two line breaks instead of one with XQuery and -stream:on?

Added by Martin Honnen over 1 year ago

When I run the XQuery

declare namespace output = "http://www.w3.org/2010/xslt-xquery-serialization";

declare option output:method 'text'; declare option output:item-separator ' ';

root/items/item/string-join(*, ' ')

through SaxonCS or EE 11.3 and the command line -s:sample.xml -q:query.xq without streaming I get each item on one line, with just the line break separating each line. If I add -stream:on to the command line, the output contains an additional empty line between the lines with the item data.

So sample is e.g.

<root>
  <items>
    <item>
      <name>item 1</name>
      <category>cat 1</category>
    </item>
    <item>
      <name>item 2</name>
      <category>cat 1</category>
    </item>
    <item>
      <name>item 3</name>
      <category>cat 2</category>
    </item>
    <item>
      <name>item 4</name>
      <category>cat 1</category>
    </item>
    <item>
      <name>item 5</name>
      <category>cat 3</category>
    </item>
    <item>
      <name>item 6</name>
      <category>cat 2</category>
    </item>
  </items>
</root>

Processing without streaming (both on the console or if I use -o:result.tsv in a file) is e.g.

item 1	cat 1
item 2	cat 1
item 3	cat 2
item 4	cat 1
item 5	cat 3
item 6	cat 2

while with -stream:on option I get e.g.

item 1	cat 1
 
item 2	cat 1
 
item 3	cat 2
 
item 4	cat 1
 
item 5	cat 3
 
item 6	cat 2

Replies (5)

Please register to reply

RE: Two line breaks instead of one with XQuery and -stream:on? - Added by Martin Honnen over 1 year ago

Using

declare namespace output = "http://www.w3.org/2010/xslt-xquery-serialization";

declare option output:method 'text';

string-join(root/items/item/string-join(*, '&#9;'), '&#10;')

instead even with -stream:on outputs no empty lines in between the tab separated data lines.

I only wonder whether calling string-join(root/items/item/string-join(*, '&#9;'), '&#10;') does really do streaming or just buffers anything returned by root/items/item/string-join(*, '&#9;') in memory to join the data then as lines by the other string-join? The -t options shows no indications of restricted streaming.

RE: Two line breaks instead of one with XQuery and -stream:on? - Added by Michael Kay over 1 year ago

I'm using '+' as the item-separator for clarity, with '|' as the second argument of the string-join..

With -stream:off, I get

item 1|cat 1+item 2|cat 1+item 3|cat 2+item 4|cat 1+item 5|cat 3+item 6|cat 2

With -stream:on, I get

item 1|cat 1+ +item 2|cat 1+ +item 3|cat 2+ +item 4|cat 1+ +item 5|cat 3+ +item 6|cat 2

which suggests that with -stream:on, the items are being separated by spaces, and these are then being separated using the item-separator.

RE: Two line breaks instead of one with XQuery and -stream:on? - Added by Michael Kay over 1 year ago

The extra spaces are being inserted because XQueryEvaluator.runStreamed() calls

Receiver receiver = destination.getReceiver(config.makePipelineConfiguration(), params);

which inserts a TreeReceiver into the output pipeline; the TreeReceiver adds spaces between adjacent items. This is probably a hangover from the pre-Saxon10 code, where addition of separators was done in the query/transformation engine rather than in the serializer; it was only the introduction of the item-separator property that forced this change.

RE: Two line breaks instead of one with XQuery and -stream:on? - Added by Michael Kay over 1 year ago

I was hoping that changing the code in XQueryEvaluator.runStreamed() to do the same as XQueryEvaluator.run() would fix it, namely:

            Receiver receiver = getDestinationReceiver(destination);
            expression.runStreamed(context, source, receiver, null);
            destination.closeAndNotify();

However, I still get the extra spaces.

We're still getting a TreeReceiver in the pipeline; this time it is constructed by SerializerFactory.getReceiverForNonSerializedResult() at line 454.

It seems that XQueryExpression.runStreamed() also needs changing to align with XQueryExpression.run(). The latter method has logic

        if (result instanceof Receiver) {
            out = (Receiver) result;
        } else {
            SerializerFactory sf = context.getConfiguration().getSerializerFactory();

whereas the streamed version is invoking the SerializerFactory unconditionally (leading to insertion of the TreeReceiver).

Adding this logic to XQueryExpressionEE.runStreamed() fixes the bug.

    (1-5/5)

    Please register to reply