Bug #4176
closedparse-xml() in Saxon-JS loses html namespace
0%
Description
To reproduce:
- Unzip the attached zip file, saxon-js-parse-xml-bug.zip to a directory served up by a web server.
- Also make Saxon-JS 1.2.0 available there
- Edit test.html to adjust the path to Saxon-JS
- Open the url to test.html in a browser.
- Paste the markup below into the text area.
- Click Click Me
- Use the browser's Inspect Element to inspect the text "Where's my namespace?" Notice that the atom namespace has been preserved, but the <p> element has lost its namespace.
<atom:foo xmlns:atom="http://www.w3.org/2005/Atom">
<html:p xmlns:html="http://www.w3.org/1999/xhtml">Where's my namespace?</html:p>
</atom:foo>
In my case, I'm taking user input (parsed and transformed) and making POST to a web service. The service's xsd expects the html part of the document to be in a namespace and rejects the request if it is not. Even if I transform the parsed input to match elements without a namespace and try to add the html namespace back on, when written out the namespace is still missing.
Files
Updated by Martin Honnen almost 6 years ago
Are you sure that it is the parse-xml
function that loses the XHTML namespace? Or the following <xsl:copy-of select="$document"/>
into a browser's text/html HTML document? It might be worth checking what $document/*/*/concat(namespace-uri(), ';', name())
outputs.
Updated by Martin Honnen almost 6 years ago
According to https://saxonica.plan.io/issues/3066, the special treatment of (X)HTML elements as "no-namespace elements" is by design in Saxon-JS.
So the issue is not caused when parsing the input (whether it is parse-xml
from a string or doc
from a URI) but rather by the implementation of the XDM in Saxon-JS which special cases (X)HTML elements by stripping namespace and prefix:
if (SaxonJS.getPlatform().inBrowser && node instanceof HTMLElement && node.namespaceURI == "http://www.w3.org/1999/xhtml") {
return Atomic.QName.fromParts("", "", node.localName);
}
On the other hand, the note their says "this should only apply to HTML DOM traversal, not XML".
As the browser side DOM implementation doesn't make a difference between HTML elements in HTML document and in XML documents (since HTML5 in both kind of documents you get a HTMLElement
in the XHTML namespace), there doesn't seem to be a way to preserve the namespace and prefix of XHTML elements in XML DOM documents in Saxon-JS's XDM.
Updated by Michael Kay almost 6 years ago
- Project changed from Saxon to SaxonJS
- Assignee set to Debbie Lockett
Updated by Debbie Lockett almost 6 years ago
- Status changed from New to In Progress
- Priority changed from Low to Normal
- Applies to JS Branch 1.0, Trunk added
As Martin has suggested, I can confirm that the problem is not actually caused by parse-xml()
. The result from parse-xml()
does have the correct XHTML namespace (and html prefix).
Furthermore, the Saxon-JS special treatment of XHTML elements means that the prefix and namespace will indeed be lost, certainly (by design) at the point that $document
is added to the HTML page with <xsl:result-document>
. But in fact it looks like the namespaces are lost even when making the copy with <xsl:copy-of select="$document"/>
, which I think is not by design:
Much like the Saxon-JS code that Martin points to in domutils.nameOfNode
, the code in domutils.copyItem
looks suspicious because we drop all namespaces if SaxonJS.getPlatform().inBrowser && newNode instanceof HTMLElement
. I think we should also be checking whether context.resultDocument == window.document
; i.e. whether newNode
is to be added to the HTML page or not (note for instance this condition is used in context.createElement
as used to first create newNode
).
You say that the actual intention is to send the $document
in a POST request. I have done some further testing with the supplied repro, to see how that can work. One point to note is that if you are using ixl:schedule-action/@http-request
, you will need to be careful to ensure that the supplied body
is a document-node()
, else it seems that the XHTML namespace may get lost at this stage. e.g. edit the $document
variable in the button onclick template as shown in the example below
<xsl:template match="button[@id = 'clickMe']" mode="ixsl:onclick">
<xsl:variable name="document" as="document-node()">
<xsl:try>
<xsl:sequence select="parse-xml(ixsl:get(ixsl:page()//textarea, 'value'))"/>
<xsl:catch><xsl:document><not-a-document/></xsl:document></xsl:catch>
</xsl:try>
</xsl:variable>
<xsl:variable name="request"
select="map{'body': $document,
'method': 'POST',
'media-type': 'application/xhtml+xml',
'href': '...'}"/>
<ixsl:schedule-action http-request="$request">
<xsl:call-template name="handleResponse"/>
</ixsl:schedule-action>
</xsl:template>
Does this help you get a bit further? I guess it may depend on what further transforming you actually want to do to the result from parse-xml()
, before it gets sent in the POST request...
Updated by Michael Kay over 5 years ago
At Debbie's instigation, I've been giving this some thought and trying to work back to first principles. We should probably be consistent with the way that the HTML5 specification attempts to resolve the problem, by modifying the semantics of XPath 1.0 and XSLT 1.0 as described here:
https://html.spec.whatwg.org/multipage/infrastructure.html#interactions-with-xpath-and-xslt
There are two parts to this.
Firstly, no-namespace names in path expressions are taken, under some circumstances, to match elements in the XHTML namespace. The specific circumstance is that the context node for the expression is "from an HTML DOM". This phrase is a bit too informal for our purposes; in an XSLT context it raises question like, if you do an xsl:copy-of a subtree from the HTML page, does that operate like an HTML DOM for this purpose? One way of interpreting the rule might be: for any axis step where the principal node kind is element and the node test is in the form of an NCName, if the context item for that axis step is an element in the XHTML namespace, or a document node whose child element is an element in the XHTML namespace, then interpret the name test as matching names in the XHTML namespace.
Secondly, tree construction. The HTML5 spec says that if the output method is "html", then: If the transformation program outputs an element in no namespace, the processor must, prior to constructing the corresponding DOM element node, change the namespace of the element to the HTML namespace, ASCII-lowercase the element's local name, and ASCII-lowercase the names of any non-namespaced attributes on the element.
Tying this to the output method doesn't work very well, at least not for XSLT 2.0+ where we have temporary trees and secondary result documents. The right time to do this conversion seems to be when we inject nodes into the HTML page. I think it's unambiguous when we are doing that, because it is only done using recognizable calls on xsl;result-document (plus things like ixsl:set-attribute)
Note that apart from these changes, elements in the HTML5 DOM appear as being in the XHTML namespace, for example namespace-uri() returns the XHTML namespace, and searches that explicitly request elements in the XHTML namespace succeed.
I have wondered about a more general solution to the problem of unprefixed element names in path expressions, which has always been one of the biggest usability problems in XPath. One solution is to interpret an unprefixed element name as matching on the local name only (that is, matching any namespace) -- and relying on the syntax Q{}local to match no namespace, where that is needed. Of course this would be a big incompatibility, so it would have to be switchable, but I suspect that it wouldn't break very much code, because the number of cases is very small where you write /a/b/c
and actually want to get no match on elements having the right local name but the wrong namespace. Another approach is a generalisation of the HTML5 modification to XPath semantics: a mode of operation in which an unprefixed NCName in an axis step means "match elements having the same namespace as the context node" (or the root element, if starting at a document node).
Whatever we do it's probably sufficiently disruptive that we should only consider it for JS2.
Updated by Debbie Lockett over 5 years ago
Various changes have been made on the development branches for Saxon 10.0 and Saxon-JS 2.x. The major change is that in Saxon-JS 2.x, elements in the HTML5 DOM will appear as being in the XHTML namespace (rather than the null namespace as in Saxon-JS 1.x).
One consequence of the changes is that SEFs generated (with target JS2) using 9.9 or earlier will not necessarily work correctly with Saxon-JS 2.x. For example, user interaction events for page clicks, etc. don't work properly, due to the namespace change for HTML page elements.
This is one reason why we have decided that for use in Saxon-JS 2.x, SEFs will be required to be generated using 10.0 or later. A check for this has now been added in Saxon-JS 2.x (an error is thrown if the Saxon version used to generate the SEF is less than 10.0).
Updated by Debbie Lockett over 5 years ago
Saxon 10.0 development branch changes have been committed in the last couple of months to implement the following:
A new option -ns
is available on the net.sf.saxon.Transform
command line. It can be used to specify the default namespace for elements and types (in effect, a default for the xpath-default-namespace
attribute). In addition, the value -ns:##any
means that unprefixed element names appearing in path expressions and match patterns will match elements in any namespace (or none), and the value -ns:##html5
simulates the rules in the HTML5 specification, meaning that unprefixed element names match elements that are either in no namespace, or in the XHTML namespace.
Updated by Debbie Lockett over 4 years ago
- Status changed from In Progress to Resolved
This should all be fixed for Saxon-JS 2 and compiling with Saxon-EE 10. Details have been added in the respective documentations.
Updated by Michael Kay over 4 years ago
- Status changed from Resolved to Closed
- Fixed in JS Release set to Saxon-JS 2.0
Please register to edit this issue
Also available in: Atom PDF Tracking page