Project

Profile

Help

Bug #3448

closed

item-separator is ignored

Added by Michael Kay over 6 years ago. Updated over 6 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
Serialization
Sprint/Milestone:
-
Start date:
2017-09-15
Due date:
% Done:

0%

Estimated time:
Legacy ID:
Applies to branch:
9.8
Fix Committed on Branch:
trunk
Fixed in Maintenance Release:
Platforms:

Description

Under some scenarios, the item-separator serialization attribute is ignored.

Reported by Max Toro on the saxon-help list. T example given used

<xsl:output method="xml" omit-xml-declaration="yes" item-separator="|"
build-tree="yes"/>
Actions #1

Updated by Michael Kay over 6 years ago

Max Toro also reports an instance where item-separator="#absent" leads to the string "#absent" being used as the item separator (in this case build-tree="no" was specified).

Actions #2

Updated by Michael Kay over 6 years ago

  • Description updated (diff)
Actions #3

Updated by Max Toro over 6 years ago

After reading the specs, and https://www.w3.org/Bugs/Public/show_bug.cgi?id=29431, my understanding is that build-tree should not interfere with item-separator, its role is to specify whether you get a raw sequence (where item-separator could be used) or a normalized sequence (where item-separator could also be used, before everything is wrapped in a document node).

Actions #4

Updated by Max Toro over 6 years ago

BTW, I'm using the command line, not the API.

Actions #5

Updated by Michael Kay over 6 years ago

Been looking into this. Yes, it's fairly clear that Saxon hasn't implemented this part of the spec. It's also fairly disruptive to do so. I shall try and get it working on the development branch first, and then assess whether it's sufficiently low risk to back-port to the 9.8 branch. There are two main problems: a lot of new tests are needed for item-separator itself, plus ensuring it doesn't cause regression on other paths (especially uses of the API that differ from those exercised by the main test driver).

Actions #6

Updated by Max Toro over 6 years ago

Was my understanding that build-tree should not interfere with item-separator correct? Wrapping everything in a document node is the last step in sequence normalization, and item-separator is used in step 3. On the other hand, there's a note in the spec (section 26) that confuses me:

The item-separator attribute has no effect if the sequence being serialized contains only one item, which will always be the case if the effective value of build-tree is yes.

¿?

Actions #7

Updated by Michael Kay over 6 years ago

Yes, that sentence confuses me too, and I'm the one who wrote it...

Actions #8

Updated by Michael Kay over 6 years ago

I have raised spec bug 30208 to address this incorrect Note: https://www.w3.org/Bugs/Public/show_bug.cgi?id=30208

Actions #9

Updated by Michael Kay over 6 years ago

Let's try and understand the existing code.

For example, let's take the Xslt30Transformer.callTemplate(QName, Destination) path. Following the changes made for

https://saxonica.plan.io/boards/3/topics/7076, this is what happens.

(a) we establish whether the Destination is one that expects to receive a tree (Destination.isBuildTree()). If the Destination is a Serializer then this takes account of the default output properties (xsl:output) in the stylesheet - but only the method attribute, not the build-tree attribute. Since the spec says that method only provides a default for build-tree, this is probably wrong. In all other cases the Destination unconditionally says whether or not a tree is wanted, ignoring anything that xsl:output/@build-tree says. This is permitted by the spec.

(b) if the destination is a Serializer then we call the SerializerFactory to construct a serialization pipeline, whose entry point is a Receiver. For the "traditional" output methods this entry point will be constructed by calling makeSequenceNormalizer(). If there is no item-separator, this method returns a ComplexContentOutputter. If there is an item-separator, it returns a SequenceNormalizer.

The SequenceNormalizer is a very simple class which seems to insert item-separators and do nothing else. It also seems to get that wrong: it inserts a separator before a startElement() event, for example, whether or not it is a level-0 element. It's hard to see that the SequenceNormalizer is an adequate substitute for a ComplexContentOutputter, which is an ugly brute of a class that mixes lots of functionality: insertion of space separators, some namespace fixup, checking for duplicate and out-of-sequence attributes and namespaces, etc.

(c) If the destination isn't a Serializer then we call Destination.getReceiver() which has different logic for each kind of Destination. For an XdmDestination we get a ComplexContentOutputter. For a DomDestination we get a DOMWriter, for a SAXDestination we get a ContentHandlerProxy - these last two are very direct translators of Receiver events to the external API, which do nothing resembling sequence normalization.

We then get to call Controller.openResult() which has this horrible logic:

        if (result instanceof ComplexContentOutputter) {
            out = (SequenceReceiver) receiver;
        } else if (buildTree) {
            out = ComplexContentOutputter.makeComplexContentReceiver(receiver, null);
            // This creates a ComplexContentOutputter coupled to a NamespaceReducer
        } else if (receiver instanceof SequenceReceiver) {
            out = (SequenceReceiver) receiver;
        } else {
            out = new TreeReceiver(receiver);
            // This branch is used for a DOMDestination or SAXDestination; the TreeReceiver inserts spaces between atomic values, it ignores item-separator.
        }

Clearly a bad code smell here. My instinct is to take a machete to it on the development branch, and treat it with kid gloves on the 9.8 branch.

Actions #10

Updated by Michael Kay over 6 years ago

Let's try and establish some principles for a redesign.

(a) the Controller should deliver raw output. This basically means that the transform(), transformDocument(), and transformStream() entry points should go, leaving callTemplate and applyTemplates, each with pull and push variants. The pull variants are used only when the API is a pull request, and this happens only when raw results are required. The push variants deliver events to a SequenceReceiver.

(b) On top of the Controller sits the Xslt30Transformer. The XsltTransformer should be reimplemented as a layer on top of the Xslt30Transformer.

(c) All further processing of raw output becomes the responsibility of the Destination object. Existing implementations of Destinations will inherit from a new class AbstractDestination containing the common logic, which basically centres around sequence normalization, since the subsequent serialization steps are needed only in the Serializer Destination.

Actions #11

Updated by Michael Kay over 6 years ago

Progress report: I'm essentially implementing the design outlined above, and it's going quite well. I have reduced duplication between XsltTransformer and Xslt30Transformer by introducing a common superclass, and I have rewritten the methods returning XdmValue by invoking the methods that accept a Destination with a RawDestination. A great deal of the preparation of the serialization pipeline has been simplified.

All the unit tests for Xslt30Transformer are running; it's currently crashing on the pipeline test in TestXsltTransformer. Have done enough to establish (a) that this is the right way to go for 9.9, and (b) that it's much too big a change to retrofit to 9.8.

Actions #12

Updated by Michael Kay over 6 years ago

I discovered an oddity: when the raw result of an XQuery is an element node (which is quite common), and we run the query in push mode, we don't actually emit startDocument() and endDocument() events. For most kinds of destination this appears to have no adverse effects. I spotted it because there is one test ConfigTest.testSerializationConfiguration() where the XQuery output is missing a final newline that is present in the equivalent XSLT output - the newline is added by the XMLIndenter after the endDocument() event, because it makes the output cleaner when sent to console output.

Actions #13

Updated by Michael Kay over 6 years ago

I have been looking at how we handle the insertion of spaces between adjacent atomic values. It seems that there are no less than 5 classes containing logic to do this, all subclasses of SequenceReceiver: Emitter, TreeReceiver, SequenceWriter, SequenceNormalizerWithSpaceSeparator, and ComplexContentOutputter. This is surely excessive. In addition, the variable previousAtomic is defined on SequenceReceiver, but the SequenceReceiver class itself does not use it, and responsibility for its maintenance is devolved among all the subclasses. There is surely scope for a ProxyReceiver that does the whitespace insertion, and nothing else, and for inserting this ProxyReceiver into any pipeline that needs it.

I have addressed this by making all these classes invoke a common inherited method SequenceReceiver.decompose() from their append() implementation.

Actions #14

Updated by Michael Kay over 6 years ago

  • Status changed from New to In Progress
Actions #15

Updated by Michael Kay over 6 years ago

  • Status changed from In Progress to Closed
  • Fix Committed on Branch trunk added

I have fixed this on the development branch. Investigating the issue also led to raising some problems in the W3C specifications, and to clarification of the APIs in Saxon. The change involved some internal restructuring; it is therefore too risky to include in the 9.8 branch, so item-separator will not fully work until the next major release.

A number of new xslt30 test cases have been created.

Please register to edit this issue

Also available in: Atom PDF