Project

Profile

Help

Bug #5039

closed

Chaining two XSLTs where the first creates a fragment with two element children works under Node.js but not in the browser

Added by Martin Honnen almost 3 years ago. Updated almost 2 years ago.

Status:
Closed
Priority:
Low
Assignee:
-
Category:
-
Sprint/Milestone:
-
Start date:
2021-07-15
Due date:
% Done:

100%

Estimated time:
Applies to JS Branch:
2, Trunk
Fix Committed on JS Branch:
2, Trunk
Fixed in JS Release:
SEF Generated with:
Platforms:
Company:
-
Contact person:
-
Additional contact persons:
-

Description

When I run the code

const SaxonJS = require("saxon-js")

const xslt1Source = `<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  version="3.0"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  exclude-result-prefixes="#all">

    <xsl:mode on-no-match="shallow-copy"/>

    <xsl:template match="data">
        <h1><xsl:value-of select="name" /></h1>
        <div>Age: <xsl:value-of select="age" /></div>
    </xsl:template>
    
</xsl:stylesheet>`;

const xslt2Source = `<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  version="3.0"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  exclude-result-prefixes="#all">

    <xsl:mode on-no-match="shallow-copy"/>
  
    <xsl:template match="h1">
        <p>My name is <xsl:value-of select="." />!</p>
    </xsl:template>

</xsl:stylesheet>`;

const xmlSource = `<data>
<name>John</name>
<age>25</age>
</data>`;

const result = SaxonJS.XPath.evaluate(`
    fold-left(
        $xsltSources,
        parse-xml($xmlSource),
        function($node, $xsltText) {
            transform(
                map {
                    'source-node' : $node,
                    'stylesheet-text': $xsltText
                }
            )?output
        }
    )`, [], { 'params' : { 'xmlSource' : xmlSource, 'xsltSources' : [xslt1Source, xslt2Source] } })

console.log(SaxonJS.serialize(result));

under Node.js it works fine and outputs <p>My name is John!</p><div>Age: 25</div>.

Trying to run the same (minus the first line with the require call) in the browser gives an error "Uncaught DOMException: Failed to execute 'appendChild' on 'Node': Only one element on document allowed." in SaxonJS2.js:4407.

So somehow it seems the Node.js version manages to create a document fragment node with two element children while the browser version seems to try to put two element nodes into a document node.

The error only occurs when trying to chain the two stylesheets, the first run alone creating the fragment with two element children runs fine.

Actions #1

Updated by Martin Honnen almost 3 years ago

Looking at the fn:transform spec I think it might be safer to use 'delivery-format' : 'raw' in the argument map to the transform() call.

It doesn't change the error, however, somehow, although I have checked that the result after the first transformation is a document fragment node, Saxon-JS 2.2 in the browser tries to create a document node in addition to the returned document fragment node to run the second transformation, then tries to appendChild() clones of the first step transform result's document fragment node's children to the document node; appending two elements to a document node is obviously not possible in the DOM.

Actions #2

Updated by Norm Tovey-Walsh almost 2 years ago

  • Status changed from New to In Progress

It's interesting. I've identified where the failure is happening, but for both the first and second transformations, the diagnostics suggest that the append is being done to an empty document.

Actions #3

Updated by Martin Honnen almost 2 years ago

I think the problem is with document fragments, for instance, without changing stylesheets, if I use parse-xml-fragment for the source, the same error occurs:

const xslt1Source = `<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  version="3.0"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  exclude-result-prefixes="#all">

    <xsl:mode on-no-match="shallow-copy"/>
    
</xsl:stylesheet>`;

const xmlSource = `<foo>foo 1</foo><bar>bar 1</bar>`;

const result = SaxonJS.XPath.evaluate(`
    fold-left(
        $xsltSources,
        parse-xml-fragment($xmlSource),
        function($node, $xsltText) {
            transform(
                map {
                    'source-node' : $node,
                    'stylesheet-text': $xsltText,
                    'delivery-format': 'raw'
                }
            )?output
        }
    )`, [], { 'params' : { 'xmlSource' : xmlSource, 'xsltSources' : [xslt1Source] } })

console.log(SaxonJS.serialize(result));

For some reasons I am not able to figure from the minimized sources I see in the browser, Saxon-JS seems to try not work with the source node, i.e. the document fragment, but attempts to clone its children and append them to a newly created document node, which then obviously fails, as the document fragment has two element children, but the document can only contain one.

So the problem seems to be occurring due to that strategy to clone the source node's child nodes to appendChild them to a new document. As I said, I can't tell why that is happening. But a document node seems to be the wrong target in the world of XDM where fragment with several top level elements can exist.

Actions #4

Updated by Norm Tovey-Walsh almost 2 years ago

  • Status changed from In Progress to Resolved

Indeed. If you specify a sourceNode, we construct a document from it that we can use as the context node. Your first transformation turns

<data>
<name>John</name>
<age>25</age>
</data>

into

<h1><xsl:value-of select="name" /></h1>
<div>Age: <xsl:value-of select="age" /></div>

(Which I observe you couldn't have parsed with parse-xml() even though we're willing to create a document fragment containing it.)

What happens in the second transformation is that SaxonJS creates a new, empty document.implementation.createDocument() and attempts to populate it with the children of the node you passed in. When it attempts to add the second child, the browser enforces the constraint that a document can have only one document element.

I thought it might be possible to work around this by constructing a document fragment and then adding the fragment to the document, but no.

I don't know if it's a bug or a feature that the NodeJS DOM doesn't enforce this constraint as strictly, but I don't think there's anything we can do in the browser.

I'm going to mark this resolved but leave it open for the moment. I want to consider whether we could and/or should document this case more explicitly.

Actions #5

Updated by Norm Tovey-Walsh almost 2 years ago

Alternatively, I suppose, we could see if we can use a docuement fragment as a source, rather than a document. I investigated far enough to determine that it doesn't work today, I haven't (yet) tried to work out if it could.

Actions #6

Updated by Norm Tovey-Walsh almost 2 years ago

I take that back. Simple error in my experiment. That might work.

Actions #7

Updated by Martin Honnen almost 2 years ago

What is the whole reason for the construction of a new document and the cloning of the (children of the) source node? Why is the source node not processed directly itself?

Actions #8

Updated by Norm Tovey-Walsh almost 2 years ago

  • Status changed from Resolved to In Progress
Actions #9

Updated by Martin Honnen almost 2 years ago

I have tried to further understand the code that Saxon-JS uses, it seems there is a check (showing minimized for me as

var U = gb.createDocument();
if (Kc.Be(X))
      for (na = X.childNodes, x = 0; x < na.length; x++)
             U.appendChild(na[x].cloneNode(!0));
else
      U = z = X.cloneNode(!0);

and that Kc.Be(X) basically checks whether the source node X is a document or a document fragment node; based on that, the whole approach of first creating a new document to appendChild cloned children from X doesn't make much sense if X is check to be a document fragment or a document.

I am still not sure why the whole cloning is necessary and the source node can't be used directly. But I think that the DOM method cloneNode does work on document fragments and document nodes as well so I don't see the need to check for these node type to create a document container separately and do some loop to clone and appendChild the children of X, it seems the last line that does X.cloneNode(!0) i.e. X.cloneNode(true) would do for any kind of source node if it needs to be cloned to serve as the input to fn:transform.

So in my opinion it might suffice to do e.g. U = z = X.cloneNode(true); without checking and doing something else if X is a document fragment node or a document node.

And I still wonder why the whole cloning is done instead of using the source node itself. I pinged John Lumley about that via Slack, he said he will have a look tomorrow although he is not sure he can help after not having worked on the code for two years.

Actions #10

Updated by Michael Kay almost 2 years ago

The actual code (coreFn.js line 2668) is

           if (options['sourceNode']) {
                const sN = options['sourceNode'];
                sourceNode = Platform.createDocument();
                if (DU.isDocNode(sN)) {
                    const children = sN.childNodes;
                    for (let i = 0; i < children.length; i++) {
                        sourceNode.appendChild(children[i].cloneNode(true));
                    }
                } else {
                    n = sN.cloneNode(true);
                    sourceNode = n;
                }
                sourceNode["_saxonBaseUri"] = sN["_saxonBaseUri"];
                sourceNode["_saxonDocUri"] = sN["_saxonDocUri"];
            }

The method DU.isDocNode(X) checks whether X is a document node or document fragment node.

From a fairly quick glance at the code, the DOM implementation that we use on Node.js appears to allow a Document node to contain multiple element children.

I'm not sure off-hand why we feel it necessary to copy the supplied source node. It may be because of whitespace stripping - we don't want that to affect the node supplied by the first transformation. Alternatively, it might be because of the possibility that the sourceNode is the HTML document (and therefore mutable).

Actions #11

Updated by Norm Tovey-Walsh almost 2 years ago

Debbie and I are a little perplexed by this code too. At first, I thought the conditional was backward, but now I think it's just redundant. There's no reason, AFIACT, to do something different with documents or document fragments here.

We absolutely set properties on the document node, as you can see at the end of that fragment. Whether it would be reasonable to "scribble" on the node passed from the user is an open question. I think that strip space comes into play; I'd be reluctant to strip space on the user's node, so making a copy seems sensible. Except there's some evidence that we don't do that for a direct call to SaxonJS.transform().

When I first broached this with Debbie, her first thought was that copying documents might be this way specifically to avoid problems with an HTMLDocument. But my attempts to pass an HTML document into the function cause it to fail somewhere else entirely, though there may be other code paths where an HTML document could reach here.

The good news is, we can almost certainly fix this. But working out exactly why it's this way is ongoing.

Actions #12

Updated by Martin Honnen almost 2 years ago

The only problem I ran into with HTML documents is if the contain a DOCTYPE node, see forum post https://saxonica.plan.io/boards/5/topics/8700.

Other than that, the problem seems to be the same and fixable by doing a cloneNode on the source node itself instead of trying to create a document node and appendChild the cloned children to that document.

So the following gives the known error with Uncaught DOMException: Failed to execute 'appendChild' on 'Node': Only one element on document allowed. in Saxon-JS 2.3 in the browser but "works" if the lines doing the document creation and child node cloning and insertion are omitted and the source node is simply cloned with e.g. var U = z = X.cloneNode(true);.

const xslt1Source = `<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  version="3.0"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xpath-default-namespace="http://www.w3.org/1999/xhtml"
  exclude-result-prefixes="#all">

    <xsl:mode on-no-match="shallow-skip"/>

    <xsl:template match="body">
        <xsl:copy-of select="node()"/>
    </xsl:template>
    
</xsl:stylesheet>`;

const htmlSource = `<html lang=en>
  <head>
    <meta charset=utf-8>
    <title>test</title>
</head>
<body>
  <h1>test</h1>
  <p id=p1>This is a test.
</body>
</html>`;

const htmlDoc = new DOMParser().parseFromString(htmlSource, 'text/html');

const xslt2Source = `<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  version="3.0"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xpath-default-namespace="http://www.w3.org/1999/xhtml"
  xmlns="http://www.w3.org/1999/xhtml"
  exclude-result-prefixes="#all">

    <xsl:mode on-no-match="shallow-copy"/>

    <xsl:output build-tree="no"/>

    <xsl:template match="h1">
       <h2>
          <xsl:apply-templates/>
       </h2>
    </xsl:template>
    
</xsl:stylesheet>`;

const result = SaxonJS.XPath.evaluate(`
    fold-left(
        $xsltSources,
        .,
        function($node, $xsltText) {
            transform(
                map {
                    'source-node' : $node,
                    'stylesheet-text': $xsltText,
                    'delivery-format' : 'document'
                }
            )?output
        }
    )`, htmlDoc, { 'params' : { 'xsltSources' : [xslt1Source, xslt2Source] } })

console.log(SaxonJS.serialize(result));
Actions #13

Updated by Martin Honnen almost 2 years ago

Slight correction, the error for the HTML parsing and then XSLT chaining example is a bit different and there is also a text node in the fragment that the first XSLT returns and the second tries to process: "Uncaught DOMException: Failed to execute 'appendChild' on 'Node': Nodes of type '#text' may not be inserted inside nodes of type '#document'". But the problem is the same, the source node (for the second transform call) is a document fragment node but Saxon-JS tries to clone and insert its child node into a newly created document node.

Actions #14

Updated by John Lumley almost 2 years ago

I think I was the originator of this section of the code - I certainly wrote the original fn:transform() code. But it's probably about 3-4 years ago now, so I can't recall details I fear. But there must have been some reason to do the clone and also to do this by children parts for a document source.

  • For doing the clone at all it might have been when something else might be side-effecting (e.g. using the browser document, or fragment part, as a source for an internal transform, whilst still having interaction elsewhere on the browser document). I'm just speculating, but it might be conceivable as all my development and testing took place in a SaxonJS framework within the browser itself, and it is possible I was running other processes at the same time, but this is as I say just speculation.
  • For doing the clone by child parts in the case of a full document, it might have been because of the DOCTYPE or or other prolog sections, as speculated.

I don't know if this has helped (probably not) but I have to say I can't recall why I wrote that that way, but there must have been some reason - should have commented at the time of course.

I'm not up with the current SaxonJS codebase, but a simple test might be to replace the section with a non-cloning version and run the QT3 test set and the 'in-browser' XSLT testdriver, which uses fn:transform() extensively, and see what, if anything, breaks.

John

Actions #15

Updated by Norm Tovey-Walsh almost 2 years ago

I see essentially the same code for the stylesheetNode case. I wonder if it's more significant there and what we're seeing for sourceNode is just code that's been cut-and-pasted? (Or is it the other way around?)

The status quo: if the node is a document (or a document fragment), create a new document and append clones of the children. Otherwise, just clone it. That doesn't work if the document is a document fragment and it has multiple, top-level element children.

I see a few possible solutions:

  1. If the node is a document fragment with multiple, top-level element children, then create a new document fragment and clone the children into that, otherwise do the status quo.
  2. Just clone the node regardless of its type. I'm not aware of anything that would be different about document types or other features.
  3. Something hybrid: if the node is a document or document fragment with a single top-level element child, then create a new document and clone the children into it, otherwise clone it.

And now there's the related question of how this should be applied to the stylesheetNode case, though there should never be multiple, top-level element children in that case, so it probably doesn't matter except in a "code hygiene" sense.

Actions #16

Updated by Norm Tovey-Walsh almost 2 years ago

I've run the QT3 and XSLT30 test suites with the current code and the simplification of just cloning the node. I'm planning to commit that change and see what happens when we run all the regression testing.

Taking a closer look at the similar code for the stylesheetNode case, I think there's a different bug there. I'm going to fix that the same way and see what happens.

Actions #17

Updated by Norm Tovey-Walsh almost 2 years ago

  • Status changed from In Progress to Resolved
  • Applies to JS Branch Trunk added
  • Fix Committed on JS Branch 2, Trunk added

After running a bunch of tests and attempting to find any regressions, I've simply replaced all that code in both cases with a straightforward "cloneNode" solution. There's a new unit test that demonstrates that 5039 passes and I see no differences in either the QT3 or XSLT3 test suites.

Actions #18

Updated by Debbie Lockett almost 2 years ago

  • Status changed from Resolved to Closed
  • % Done changed from 0 to 100
  • Fixed in JS Release set to SaxonJS 2.4

Bug fix applied in the SaxonJS 2.4 maintenance release.

Please register to edit this issue

Also available in: Atom PDF Tracking page