Two line breaks instead of one with XQuery and -stream:on?
Added by Martin Honnen about 2 years ago
When I run the XQuery
declare namespace output = "http://www.w3.org/2010/xslt-xquery-serialization";
declare option output:method 'text'; declare option output:item-separator ' ';
root/items/item/string-join(*, ' ')
through SaxonCS or EE 11.3 and the command line -s:sample.xml -q:query.xq
without streaming I get each item
on one line, with just the line break separating each line. If I add -stream:on
to the command line, the output contains an additional empty line between the lines with the item
data.
So sample is e.g.
<root>
<items>
<item>
<name>item 1</name>
<category>cat 1</category>
</item>
<item>
<name>item 2</name>
<category>cat 1</category>
</item>
<item>
<name>item 3</name>
<category>cat 2</category>
</item>
<item>
<name>item 4</name>
<category>cat 1</category>
</item>
<item>
<name>item 5</name>
<category>cat 3</category>
</item>
<item>
<name>item 6</name>
<category>cat 2</category>
</item>
</items>
</root>
Processing without streaming (both on the console or if I use -o:result.tsv
in a file) is e.g.
item 1 cat 1
item 2 cat 1
item 3 cat 2
item 4 cat 1
item 5 cat 3
item 6 cat 2
while with -stream:on
option I get e.g.
item 1 cat 1
item 2 cat 1
item 3 cat 2
item 4 cat 1
item 5 cat 3
item 6 cat 2
Replies (5)
Please register to reply
RE: Two line breaks instead of one with XQuery and -stream:on? - Added by Martin Honnen about 2 years ago
Using
declare namespace output = "http://www.w3.org/2010/xslt-xquery-serialization";
declare option output:method 'text';
string-join(root/items/item/string-join(*, '	'), ' ')
instead even with -stream:on
outputs no empty lines in between the tab separated data lines.
I only wonder whether calling string-join(root/items/item/string-join(*, '	'), ' ')
does really do streaming or just buffers anything returned by root/items/item/string-join(*, '	')
in memory to join the data then as lines by the other string-join
? The -t
options shows no indications of restricted streaming.
RE: Two line breaks instead of one with XQuery and -stream:on? - Added by Michael Kay about 2 years ago
I'm using '+' as the item-separator for clarity, with '|' as the second argument of the string-join..
With -stream:off, I get
item 1|cat 1+item 2|cat 1+item 3|cat 2+item 4|cat 1+item 5|cat 3+item 6|cat 2
With -stream:on, I get
item 1|cat 1+ +item 2|cat 1+ +item 3|cat 2+ +item 4|cat 1+ +item 5|cat 3+ +item 6|cat 2
which suggests that with -stream:on, the items are being separated by spaces, and these are then being separated using the item-separator.
RE: Two line breaks instead of one with XQuery and -stream:on? - Added by Michael Kay about 2 years ago
The extra spaces are being inserted because XQueryEvaluator.runStreamed()
calls
Receiver receiver = destination.getReceiver(config.makePipelineConfiguration(), params);
which inserts a TreeReceiver
into the output pipeline; the TreeReceiver
adds spaces between adjacent items. This is probably a hangover from the pre-Saxon10 code, where addition of separators was done in the query/transformation engine rather than in the serializer; it was only the introduction of the item-separator
property that forced this change.
RE: Two line breaks instead of one with XQuery and -stream:on? - Added by Michael Kay about 2 years ago
I was hoping that changing the code in XQueryEvaluator.runStreamed() to do the same as XQueryEvaluator.run() would fix it, namely:
Receiver receiver = getDestinationReceiver(destination);
expression.runStreamed(context, source, receiver, null);
destination.closeAndNotify();
However, I still get the extra spaces.
We're still getting a TreeReceiver
in the pipeline; this time it is constructed by SerializerFactory.getReceiverForNonSerializedResult()
at line 454.
It seems that XQueryExpression.runStreamed()
also needs changing to align with XQueryExpression.run()
. The latter method has logic
if (result instanceof Receiver) {
out = (Receiver) result;
} else {
SerializerFactory sf = context.getConfiguration().getSerializerFactory();
whereas the streamed version is invoking the SerializerFactory unconditionally (leading to insertion of the TreeReceiver).
Adding this logic to XQueryExpressionEE.runStreamed()
fixes the bug.
RE: Two line breaks instead of one with XQuery and -stream:on? - Added by Michael Kay about 2 years ago
Logged as bug #5569 and fixed.
Please register to reply