Project

Profile

Help

Support #6369

closed

Serialization problem of XQuery result using Saxon 12.3

Added by Radu Coravu about 2 months ago. Updated 2 days ago.

Status:
Closed
Priority:
Low
Assignee:
-
Category:
-
Sprint/Milestone:
-
Start date:
2024-03-06
Due date:
% Done:

0%

Estimated time:
Legacy ID:
Applies to branch:
Fix Committed on Branch:
Fixed in Maintenance Release:
Platforms:

Description

We run this XQuery as a transformation scenario in Oxygen:

declare namespace output = "http://www.w3.org/2010/xslt-xquery-serialization";
declare option output:method 'text';
declare option output:item-separator ', ';
let $xml:=<xml><element>text1</element><element>text2</element></xml> return $xml/element/string()

and we get as result:

text1,  , text2

Notice the two commas ", " between the two items.

To serialize we create a tree receiver something like:

    SerializerFactory sf = this.queryTransformer.getConfiguration().getSerializerFactory();
    PipelineConfiguration pipe = this.queryTransformer.getConfiguration().makePipelineConfiguration();
    SerializationProperties props = new SerializationProperties(queryTransformer.getOutputProperties());
    Receiver receiver = sf.getReceiver(new StreamResult(sw), props, pipe);
    tr = new TreeReceiver(receiver)..

The "net.sf.saxon.event.SequenceReceiver#decompose" is called for each item "text1" and "text2". The stack trace is something like:

	at net.sf.saxon.str.UnicodeWriterToWriter.write(UnicodeWriterToWriter.java:36)
	at net.sf.saxon.serialize.TEXTEmitter.characters(TEXTEmitter.java:104)
	at net.sf.saxon.event.ProxyReceiver.characters(ProxyReceiver.java:158)
	at net.sf.saxon.event.SequenceNormalizer.characters(SequenceNormalizer.java:99)
	at net.sf.saxon.event.SequenceNormalizerWithItemSeparator.sep(SequenceNormalizerWithItemSeparator.java:135)
	at net.sf.saxon.event.SequenceNormalizerWithItemSeparator.characters(SequenceNormalizerWithItemSeparator.java:75)
	at net.sf.saxon.event.TreeReceiver.characters(TreeReceiver.java:176)
	at net.sf.saxon.event.SequenceReceiver.decompose(SequenceReceiver.java:178)

For "text2" which is ATOMIC the code in "net.sf.saxon.event.SequenceReceiver.decompose(Item, Location, int)" does this:

                       protected void decompose(Item item, Location locationId, int copyNamespaces) throws XPathException {
        if (item != null) {
            switch (item.getGenre()) {
                case ATOMIC:
                case EXTERNAL:
                    if (previousAtomic) {
                        characters(StringConstants.SINGLE_SPACE, locationId, ReceiverOption.NONE);
                    }
                    characters(item.getUnicodeStringValue(), locationId, ReceiverOption.NONE);

It calls "characters(StringConstants.SINGLE_SPACE, locationId, ReceiverOption.NONE);" which adds a space and a comma before the space as the method "net.sf.saxon.event.SequenceNormalizerWithItemSeparator.characters(UnicodeString, Location, int)" always calls sep(). And then it calls:

characters(item.getUnicodeStringValue(), locationId, ReceiverOption.NONE);

which again adds a comma before the value. So we get two commas before the actual value is printed.

Actions #1

Updated by Radu Coravu about 1 month ago

Any feedback here that I could use?

Actions #2

Updated by Michael Kay about 1 month ago

Sorry for the lack of response. I've come back to it a couple of times scratching my head, and don't have a clear answer.

The responsibility for inserting space separators and item separators in the receiver pipeline isn't particularly clear. That's partly due to problems with the specs - "item separator" is a bit of an aberration because it's not exclusively concerned with serialization. A problem with the Receiver pipeline is you can assemble components in the pipeline in any order and that can lead to unexpected effects like this. I'll give it some more thought.

Actions #3

Updated by Radu Coravu about 1 month ago

No hurry, thanks for the reply Michael!

Actions #4

Updated by Michael Kay 3 days ago

I reproduced the problem with:

public void testBug6369() {

        String query = "declare namespace output = \"http://www.w3.org/2010/xslt-xquery-serialization\";\n"
                + "declare option output:method 'text';\n"
                + "declare option output:item-separator ', ';\n"
                + "let $xml:=<xml><element>text1</element><element>text2</element></xml> return $xml/element/string()";


        try {
            Processor proc = new Processor(false);
            DocumentBuilder builder = proc.newDocumentBuilder();
            XQueryCompiler compiler = proc.newXQueryCompiler();
            XQueryExecutable exec = compiler.compile(query);
            XQueryExpression expr = exec.getUnderlyingCompiledQuery();
            XQueryEvaluator eval = exec.load();

            Configuration config = proc.getUnderlyingConfiguration();
            SerializerFactory sf = config.getSerializerFactory();
            PipelineConfiguration pipe = config.makePipelineConfiguration();
            SerializationProperties props = new SerializationProperties(
                    exec.getUnderlyingCompiledQuery().getExecutable().getOutputProperties());
            StringWriter sw = new StringWriter();
            Receiver receiver = sf.getReceiver(new StreamResult(sw), props, pipe);
            Receiver tr = new TreeReceiver(receiver);
            expr.run(eval.getUnderlyingQueryContext(), tr, props.getProperties());
            System.err.println(sw.toString());
        } catch (XPathException | SaxonApiException e) {
            e.printStackTrace();
            fail(e);
        }
    }

Hope that's a reasonable interpretation of what you are doing.

Actions #5

Updated by Michael Kay 3 days ago

So, the TreeReceiver has a decompose() method that inserts a whitespace character between the "text1" and "text2" events, so there are three character events arriving at the SequenceNormallizerWithItemSeparator ("text1", " ", "text2"), and the SequenceNormalizer inserts the item separator between each pair.

The "text1" and "text2" items are hitting the TreeReceiver.append() method rather than the characters() method because the query explicitly makes themstrings, not text nodes, by calling the string() function.

If we cut out the TreeReceiver from the pipeline the result is text1, text2 which I assume is what you want.

I think the basic problem is that you only want a TreeReceiver if you want to deliver the result assembled as an XML document node, and you only want a SequenceNormalizerWithItemSeparator if you want it as a serialized sequence with item separators; you can't have both.

Actions #6

Updated by Michael Kay 3 days ago

  • Status changed from New to Closed

Some of this is hinted at in the Javadoc for SerializerFactory.getReceiver():

     * <p>The effect of the method changes in Saxon 9.7 so that for serialization methods other than
     * "json" and "adaptive", the returned Receiver performs the function of "sequence normalization" as
     * defined in the Serialization specification. Previously the client code handled this by wrapping the
     * result in a ComplexContentOutputter (usually as a side-effect of called XPathContext.changeOutputDestination()).
     * Wrapping in a ComplexContentOutputter is no longer necessary, though it does no harm because the ComplexContentOutputter
     * is idempotent.</p>

But configuring a Receiver pipeline to do exactly what you want is a bit of a black art, and the components can't exactly be plugged together in any order as the design might suggest.

I shall close this now, but feel free to re-open if you have any questions.

Actions #7

Updated by Radu Coravu 2 days ago

Thanks for the analysis Michael!

Please register to edit this issue

Also available in: Atom PDF