Support #6369
closedSerialization problem of XQuery result using Saxon 12.3
0%
Description
We run this XQuery as a transformation scenario in Oxygen:
declare namespace output = "http://www.w3.org/2010/xslt-xquery-serialization";
declare option output:method 'text';
declare option output:item-separator ', ';
let $xml:=<xml><element>text1</element><element>text2</element></xml> return $xml/element/string()
and we get as result:
text1, , text2
Notice the two commas ", " between the two items.
To serialize we create a tree receiver something like:
SerializerFactory sf = this.queryTransformer.getConfiguration().getSerializerFactory();
PipelineConfiguration pipe = this.queryTransformer.getConfiguration().makePipelineConfiguration();
SerializationProperties props = new SerializationProperties(queryTransformer.getOutputProperties());
Receiver receiver = sf.getReceiver(new StreamResult(sw), props, pipe);
tr = new TreeReceiver(receiver)..
The "net.sf.saxon.event.SequenceReceiver#decompose" is called for each item "text1" and "text2". The stack trace is something like:
at net.sf.saxon.str.UnicodeWriterToWriter.write(UnicodeWriterToWriter.java:36)
at net.sf.saxon.serialize.TEXTEmitter.characters(TEXTEmitter.java:104)
at net.sf.saxon.event.ProxyReceiver.characters(ProxyReceiver.java:158)
at net.sf.saxon.event.SequenceNormalizer.characters(SequenceNormalizer.java:99)
at net.sf.saxon.event.SequenceNormalizerWithItemSeparator.sep(SequenceNormalizerWithItemSeparator.java:135)
at net.sf.saxon.event.SequenceNormalizerWithItemSeparator.characters(SequenceNormalizerWithItemSeparator.java:75)
at net.sf.saxon.event.TreeReceiver.characters(TreeReceiver.java:176)
at net.sf.saxon.event.SequenceReceiver.decompose(SequenceReceiver.java:178)
For "text2" which is ATOMIC the code in "net.sf.saxon.event.SequenceReceiver.decompose(Item, Location, int)" does this:
protected void decompose(Item item, Location locationId, int copyNamespaces) throws XPathException {
if (item != null) {
switch (item.getGenre()) {
case ATOMIC:
case EXTERNAL:
if (previousAtomic) {
characters(StringConstants.SINGLE_SPACE, locationId, ReceiverOption.NONE);
}
characters(item.getUnicodeStringValue(), locationId, ReceiverOption.NONE);
It calls "characters(StringConstants.SINGLE_SPACE, locationId, ReceiverOption.NONE);" which adds a space and a comma before the space as the method "net.sf.saxon.event.SequenceNormalizerWithItemSeparator.characters(UnicodeString, Location, int)" always calls sep(). And then it calls:
characters(item.getUnicodeStringValue(), locationId, ReceiverOption.NONE);
which again adds a comma before the value. So we get two commas before the actual value is printed.
Updated by Michael Kay about 1 month ago
Sorry for the lack of response. I've come back to it a couple of times scratching my head, and don't have a clear answer.
The responsibility for inserting space separators and item separators in the receiver pipeline isn't particularly clear. That's partly due to problems with the specs - "item separator" is a bit of an aberration because it's not exclusively concerned with serialization. A problem with the Receiver pipeline is you can assemble components in the pipeline in any order and that can lead to unexpected effects like this. I'll give it some more thought.
Updated by Radu Coravu about 1 month ago
No hurry, thanks for the reply Michael!
Updated by Michael Kay 3 days ago
I reproduced the problem with:
public void testBug6369() {
String query = "declare namespace output = \"http://www.w3.org/2010/xslt-xquery-serialization\";\n"
+ "declare option output:method 'text';\n"
+ "declare option output:item-separator ', ';\n"
+ "let $xml:=<xml><element>text1</element><element>text2</element></xml> return $xml/element/string()";
try {
Processor proc = new Processor(false);
DocumentBuilder builder = proc.newDocumentBuilder();
XQueryCompiler compiler = proc.newXQueryCompiler();
XQueryExecutable exec = compiler.compile(query);
XQueryExpression expr = exec.getUnderlyingCompiledQuery();
XQueryEvaluator eval = exec.load();
Configuration config = proc.getUnderlyingConfiguration();
SerializerFactory sf = config.getSerializerFactory();
PipelineConfiguration pipe = config.makePipelineConfiguration();
SerializationProperties props = new SerializationProperties(
exec.getUnderlyingCompiledQuery().getExecutable().getOutputProperties());
StringWriter sw = new StringWriter();
Receiver receiver = sf.getReceiver(new StreamResult(sw), props, pipe);
Receiver tr = new TreeReceiver(receiver);
expr.run(eval.getUnderlyingQueryContext(), tr, props.getProperties());
System.err.println(sw.toString());
} catch (XPathException | SaxonApiException e) {
e.printStackTrace();
fail(e);
}
}
Hope that's a reasonable interpretation of what you are doing.
Updated by Michael Kay 3 days ago
So, the TreeReceiver
has a decompose() method that inserts a whitespace character between the "text1" and "text2" events, so there are three character events arriving at the SequenceNormallizerWithItemSeparator
("text1", " ", "text2")
, and the SequenceNormalizer inserts the item separator between each pair.
The "text1"
and "text2"
items are hitting the TreeReceiver.append()
method rather than the characters()
method because the query explicitly makes themstrings, not text nodes, by calling the string()
function.
If we cut out the TreeReceiver
from the pipeline the result is text1, text2
which I assume is what you want.
I think the basic problem is that you only want a TreeReceiver
if you want to deliver the result assembled as an XML document node, and you only want a SequenceNormalizerWithItemSeparator
if you want it as a serialized sequence with item separators; you can't have both.
Updated by Michael Kay 3 days ago
- Status changed from New to Closed
Some of this is hinted at in the Javadoc for SerializerFactory.getReceiver():
* <p>The effect of the method changes in Saxon 9.7 so that for serialization methods other than
* "json" and "adaptive", the returned Receiver performs the function of "sequence normalization" as
* defined in the Serialization specification. Previously the client code handled this by wrapping the
* result in a ComplexContentOutputter (usually as a side-effect of called XPathContext.changeOutputDestination()).
* Wrapping in a ComplexContentOutputter is no longer necessary, though it does no harm because the ComplexContentOutputter
* is idempotent.</p>
But configuring a Receiver pipeline to do exactly what you want is a bit of a black art, and the components can't exactly be plugged together in any order as the design might suggest.
I shall close this now, but feel free to re-open if you have any questions.
Please register to edit this issue