Project

Profile

Help

Support #4267

Unexpected behaviour getting serialized xslt result

Added by Rick Vlaming about 2 months ago. Updated 12 days ago.

Status:
Closed
Priority:
Low
Assignee:
Category:
Serialization
Sprint/Milestone:
-
Start date:
2019-08-02
Due date:
% Done:

0%

Legacy ID:
Applies to branch:
9.9
Fix Committed on Branch:
Fixed in Maintenance Release:

Description

I am the same person as Rick Vlaming, but I don't know which email I used previously when reporting an issue last year. Nevertheless...

I have an issue with getting the serialized result from a xslt-transformation. I don't know if this is really an issue or just designed that way. Or perhaps there is an alternative.

We are in the transition from using JAXP to Saxon S9Api and using the licensed saxon-ee 9.9.1.4 java edition. Also we are planning to use xslt-chaining where the result of one xslt is going to be used in another xslt. This chain can consist of multiple xslt-transformations (say 10 transformations).

I have created a reusable method which takes the xml (as a XdmNode) and the xslt and which gives the result as a XdmNode. On certain points in the chain the result has to be serialized. At such a point I was planning to serialize the XdmNode to the database. At first that seems to be working, but then I came across a xslt which has the omit-xml-declaration to "yes". The unittest failed because the serialization did include the xml-declaration. The application is depending on that omit because the result is put as a payload in another xml.

I discovered that when using a TeeDestination which includes a serializer destination the xml-declaration is omitted when getting the result from stringwriter in de serializer destination. However I would like to have the reusable method to return the result as a XdmNode because when we do the chaining only a few times we have to serialize. So most of the time the serialisation within the transform is not needed, so I would like to spare that effort for the transform method.

As an example to show the problem I did a small rewrite on the reusable methode. In stead of return the result as an XdmNode the example is returning the destination which I added to the input.

The first test "omitXmlDeclarationXdmNodeDestination" does the transform with a XdmDestination and the serialization (method used also included) on the XdmNode of de XdmDestionation. The asserting on output of the serialization fails, because the xml-declaration is in the output.

The second test "omitXmlDeclarationTeeDestination" does the transform with a TeeDestionation. The asserting on de stringWriter in that TeeDestination is succesfull, the xml-declaration is not in the result. I did put in an exta assert on de XdmNode, also from the teedestionation. Then the serializing is in fact as in the previous test and that is also failing.

import static org.hamcrest.core.Is.is;
import static org.junit.Assert.assertThat;

import java.io.ByteArrayInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.StringWriter;
import java.nio.charset.StandardCharsets;

import javax.xml.transform.Source;
import javax.xml.transform.stream.StreamSource;

import org.junit.Test;

import net.sf.saxon.s9api.Destination;
import net.sf.saxon.s9api.Processor;
import net.sf.saxon.s9api.SaxonApiException;
import net.sf.saxon.s9api.Serializer;
import net.sf.saxon.s9api.TeeDestination;
import net.sf.saxon.s9api.XdmDestination;
import net.sf.saxon.s9api.XdmNode;
import net.sf.saxon.s9api.Xslt30Transformer;
import net.sf.saxon.s9api.XsltCompiler;

import nl.belastingdienst.vmg.fabriek.common.domain.SharedException;
import nl.belastingdienst.vmg.fabriek.common.util.SaxonS9ApiUtil;
import nl.belastingdienst.vmg.fabriek.common.util.XsltTest;

public class xslt9_9JavaTest extends XsltTest {

    private static final String PATH_TO_TEST_FILES = "src/test/resources/transformer/xsltjava/";
    private static final String XML_DECLARATION_TAG = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>";

    @Test
    public void omitXmlDeclarationXdmNodeDestination() throws IOException {
        String omitInputXml = getFileContent(PATH_TO_TEST_FILES + "omit-input.xml");
        XdmNode omitInputXdmNode = SaxonS9ApiUtil.getXdmNode(omitInputXml);

        InputStream omitXslt = getInputStream(PATH_TO_TEST_FILES + "omit.xslt");

        XdmDestination destination = new XdmDestination();
        transform(omitInputXdmNode, omitXslt, destination);

        String result = getSerializedXdmNode(destination.getXdmNode());
        System.out.println(result);
        assertThat("assertXdmNode", result.startsWith(XML_DECLARATION_TAG), is(false));
    }

    @Test
    public void omitXmlDeclarationTeeDestination() throws IOException {
        String omitInputXml = getFileContent(PATH_TO_TEST_FILES + "omit-input.xml");
        XdmNode omitInputXdmNode = SaxonS9ApiUtil.getXdmNode(omitInputXml);

        InputStream omitXslt = getInputStream(PATH_TO_TEST_FILES + "omit.xslt");

        XdmDestination resultXdmDestination = new XdmDestination();
        StringWriter resultStringWriter = new StringWriter();
        Serializer resultSerializer = omitInputXdmNode.getProcessor().newSerializer(resultStringWriter);
        TeeDestination teeDestination = new TeeDestination(resultXdmDestination, resultSerializer);
        transform(omitInputXdmNode, omitXslt, teeDestination);

        String transformed = resultStringWriter.toString();
        System.out.println(transformed);
        assertThat("assertStringWriter", transformed.startsWith(XML_DECLARATION_TAG), is(false));

        transformed = getSerializedXdmNode(resultXdmDestination.getXdmNode());
        System.out.println(transformed);
        assertThat("assertXdmNode", transformed.startsWith(XML_DECLARATION_TAG), is(false));
    }

    private static String getSerializedXdmNode(XdmNode xdmNode) {
        Processor processor = new Processor(true);
        Serializer serializer = processor.newSerializer();
        // no omit-property here because that's in the xslt. Sometimes we use omit yes, sometimes no in xslt.
        try {
            return serializer.serializeNodeToString(xdmNode);
        } catch (SaxonApiException e) {
            throw new SharedException("Fout bij serializeren XdmNode");
        }
    }

    private void transform(XdmNode inputXml, InputStream xsltCode, Destination destination) {
        Source xsltSource = new StreamSource(xsltCode);
        Processor processor = inputXml.getProcessor();
        XsltCompiler xsltCompiler = processor.newXsltCompiler();
        try {
            Xslt30Transformer transformer = xsltCompiler.compile(xsltSource).load30();

            transformer.setGlobalContextItem(inputXml);
            transformer.applyTemplates(inputXml, destination);
        } catch (SaxonApiException e) {
            throw new SharedException("Transformatie-fout");
        }
    }

    private XdmNode getXdmNode(String xml) {
        InputStream xmlInputStream = new ByteArrayInputStream(xml.getBytes(StandardCharsets.UTF_8));
        try {
            return new Processor(true).newDocumentBuilder().build(new StreamSource(xmlInputStream));
        } catch (SaxonApiException e) {
            throw new SharedException("Fout bij aanmaken XdmNode");
        }
    }

    private InputStream getInputStream(String filename) throws IOException {
        String fileContent = getFileContent(filename);
        return new ByteArrayInputStream(fileContent.getBytes(StandardCharsets.UTF_8));
    }

    @Override
    protected String getXsdFileName() {
        // TODO: implement
        throw new UnsupportedOperationException("TODO: implement method getXsdFileName() --> String");
    }
}

omit-input.xml

<?xml version="1.0" encoding="UTF-8"?>
<test></test>

omit.xstl

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="3.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:fn="http://www.w3.org/2005/xpath-functions" xmlns:math="http://www.w3.org/2005/xpath-functions/math" xmlns:array="http://www.w3.org/2005/xpath-functions/array" xmlns:map="http://www.w3.org/2005/xpath-functions/map" xmlns:xhtml="http://www.w3.org/1999/xhtml" xmlns:err="http://www.w3.org/2005/xqt-errors" exclude-result-prefixes="array fn map math xhtml xs err" >

    <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes" omit-xml-declaration="yes"/>

    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>
</xsl:stylesheet>

Console-output "omitXmlDeclarationXdmNodeDestination":

output-1:
<?xml version="1.0" encoding="UTF-8"?><test/>

java.lang.AssertionError: assertXdmNode
Expected: is <false>
     but: was <true>
Expected :is <false>
Actual   :<true>

Console-output "omitXmlDeclarationTeeDestination":

output-1:
<test/>

output-2:

<?xml version="1.0" encoding="UTF-8"?><test/>

java.lang.AssertionError: assertXdmNode
Expected: is <false>
     but: was <true>

output first test.jpg (31.1 KB) Rick Vlaming, 2019-08-02 16:15 output first test.jpg
output test2.jpg (36.3 KB) Rick Vlaming, 2019-08-02 16:15 output test2.jpg
omit-input.jpg (11.7 KB) Rick Vlaming, 2019-08-02 16:33 omit-input.jpg

History

#2 Updated by Rick Vlaming about 2 months ago

The omit-input.xml is also not showing correctly. Also upload a file with a screenshot of that file.

#3 Updated by Martin Honnen about 2 months ago

I think with

 private static String getSerializedXdmNode(XdmNode xdmNode) {
        Processor processor = new Processor(true);
        Serializer serializer = processor.newSerializer();

you can't expect the used Serializer to know any of the serialization properties defined in the Xslt30Transformer that created that XdmNode.

Thus, if you expect to create an XdmNode first but serialize according to properties of the Xslt30Transformer, I think you need to use the newSerializer method of the Xslt30Transformer to create the Serializer.

#4 Updated by Rick Vlaming about 2 months ago

Ok thank you for your answer. If that is the case the only 2 solutions are:

  1. Always return a teeDestionation. Before when we used JAXP the reusable method for the transformation did have a xml-string as input and a xml-string as output. So the only advantage of the S9Api solution above is the fact that we now have XdmNode input. On the output site serialization is still there after each transformation.

  2. Leaving it as above, so return XdmNode and give the application knowledge when to delete the xml-declaration. Then we can delete de omit-xml-declarations within the xslt so there is still a single point of definition. Advantage is that serialization can only be done when needed. I now understand that this is also effecting the other xsl:output settings. But they are mostly fixed.

Because we have very large files I think solution 2 would be the best from performance view.

#5 Updated by Rick Vlaming about 2 months ago

Offcourse there may be more advantages using the S9Api, more functionality and probably better performance.

#6 Updated by Martin Honnen about 2 months ago

My answer was mainly trying to point out why your getSerializedXdmNode doesn't take any XSLT output settings into account.

I am sure Michael Kay will give you a better answer on the complete problem.

The only suggestion on chaining various XSLT 3 transformations with Saxon 9.9 is to look into http://saxonica.com/html/documentation/javadoc/net/sf/saxon/s9api/Xslt30Transformer.html#asDocumentDestination-net.sf.saxon.s9api.Destination- as well.

#7 Updated by Michael Kay about 2 months ago

Serialization options in xsl:output are taken into account when the result of the transformation is sent to a Serializer as the transform destination. If you sent the transformation output to an XdmNode, and then serialize the XdmNode in a separate operation from the transformation, the xsl:output options have no effect.

You have two options here. Either send the result of the last transformation in the pipeline directly to a Serializer, or provide the required options when you serialize the final XdmNode -- which you can do by initialising the Serializer before use.

#8 Updated by Rick Vlaming about 2 months ago

Thank you Michael and Martin for your quick answers. The option to only serialize at the last transformation is indeed another possibility. Allthough you need a seperate reusable method it is an interesting option. Both reusable methods can share a method returning the destination. The method used at the last transformation will use a teedestination, the other one only the XdmNode. I think that will be the solution. Thanks.

#9 Updated by Michael Kay about 2 months ago

It's worth pointing out that it's possible to pipe the results of the Nth transformation into N+1th directly rather than sending it to an XdmNode - though unless you're streaming there's probably little performance difference. You can use Xslt30Transformer.asDocumentDestination() on the N+1th transformer to get a Destination object for use with the Nth transformation.

This will mainly be beneficial if streaning, or if transformations do xsl;strip-space on their input, The xsl:strip-space operation can add a lot of overhead when the input to a transformation is supplied as an in-memory-tree rather than as a stream of events,.

#10 Updated by Rick Vlaming about 2 months ago

I will look into that.

Did also some further thinking on the serialize. I think it would be nice to have a property on the serializer which, if set, results in only setting the serialization options within the transformer. So not filling for example the string writer. That option could also be positioned as an input parameter of the transform or a property of the transformer itself. Then outside the transformer you have the xsl output options.

#11 Updated by Michael Kay about 1 month ago

  • Description updated (diff)

#12 Updated by Michael Kay about 1 month ago

(I edited the original question to correct the formatting, and deleted subsequent posts regarding the incorrect formatting. For some reason the version of Markdown used on this site only recognises the three-tilde delimiters for code blocks if preceded/followed by blank lines. If only we had some decent markup standards...)

#13 Updated by Michael Kay about 1 month ago

Looking back over the thread (sorry, I was previously glancing at it in hotel foyers and airports) I think there's a misapprehension here about how an XDM document can be passed from one transformation to another. Basically there are three approaches:

(a) pass lexical XML

(b) pass a stream of SAX-like events

(c) pass a DOM-like tree of nodes in memory

And in general, the most efficient of these is (b) -- with the caveat that for Saxon, using its own internal event representation is more efficient than using SAX itself.

The best way of achieving this in s9api is to call Xslt30Transformer.asDocumentDestination() on the second transformation, and use the resulting Destination object as the destination of the first transformation. In fact this gives Saxon complete flexibility to pass information from the first transformation to the second in whatever way it considers most efficient.

#14 Updated by Michael Kay about 1 month ago

  • Status changed from New to In Progress

#15 Updated by Rick Vlaming about 1 month ago

Thank you Michael for your help. I appreciate it.

At this moment we are using approach (a) but we are working on approach (c), possible going for approach (b) in the future. Right now the use of Xslt30Tranformer.asDocumentDestination() is not working because in some places the xml is changed in between xslt-transformations. It is not possible to change this all at once but we are going to look into that, trying to solve those intermediate changes also with xslt.

Now to the original question about the xsl:output properties in the serializer. I found out that when I use the newSerializer() method on the transformer I get a serializer with the properties from the xslt loaded in the transformer. So changing the return-type of our reusable transform method to a class with the resulting XdmNode and the newSerializer-output from the transformer I have all the information I need to serialize the XdmNode in a separate operation from the transformation.

#16 Updated by Michael Kay about 1 month ago

  • Category set to Serialization
  • Status changed from In Progress to Resolved
  • Assignee set to Michael Kay

It looks to me as if the issue is now resolved, so I'm closing the thread. Feel free to re-open it if there are outstanding issues (or preferably, raise a new thread with a more specific question).

#17 Updated by O'Neil Delpratt 12 days ago

  • Status changed from Resolved to Closed

Please register to edit this issue

Also available in: Atom PDF