Bug #5859
closedHandling of undeclared namespaces in a programmatically constructed DOM
100%
Description
Summary
Saxon crashes if the input is a DOM tree containing namespaced ("DOM level 2") elements or attributes without corresponding namespace declaration attributes.
Problem Report
The java net.sf.saxon.jaxp.TransformerImpl class has the method transform(Source xmlSource, final Result outputTarget).
Since version 10 in Saxon HE, it no longer properly copies over qualified namespaces from elements that are copied with xslt:copy-of. This is also broken in Saxon HE 11.4.
It seems to work fine with the DomResult type, but using a StreamResult does not return the namespaces of the copied elements. The result is an undeclared namespace error. I have included a test project which works fine with Saxon HE 9.9.1-8, any later version does not copy the required namespaces.
Files
Updated by Michael Kay almost 2 years ago
- Status changed from New to In Progress
It appears to be working for me, which suggests a bug that's been fixed since 11.4 went out.
The symptoms look very like bug #5616, except that that was fixed in 11.4.
I'll roll my project back to 11.4-as-released and see what happens there.
Unfortunately that's not proving straightforward - I'm hitting gradle build problems with that snapshot.
Updated by Michael Kay almost 2 years ago
I am seeing a failure on the 10.x branch (again, this is the current branch, not the released product). But the result is not quite as described. All the namespace prefixes are declared, but they aren't the namespace URIs that the Junit assertions are looking for.
Updated by Martin Honnen almost 2 years ago
For the nodes created with a DOM DocumentBuilder, I would suggest to add
DocumentBuilderFactory builderFact = DocumentBuilderFactory.newDefaultInstance();
builderFact.setNamespaceAware(true);
Haven't checked whether it makes a difference for the test project but seems like a basic sanity check for DOM Java and Saxon.
@Martin - yes, we do always ask people to use a namespace-aware DOM, but I think it only makes a difference when the DOM is constructed by an XML parser, not when the DOM is constructed programmatically. - Michael Kay
Updated by Michael Kay almost 2 years ago
On Saxon10, the output doesn't contain any undeclared namespaces, but it does contain
<parameter>[test: null]</parameter>
which looks very strange. I get this whether or not I make the DocumentBuilderFactory namespace-aware. The value of parameter is a parameter to the stylesheet supplied as a DOM element node - created using doc.createElement() as distinct from doc.createElementNS(), so we're clearly in the whole muddy area of mixing namespace-aware and non-namespace-aware interfaces, very much like bug #5616.
Updated by Michael Kay almost 2 years ago
I think it's actually bug #5727, which we've patched on the development branch in 11.x, and in the released 12.0, but it's not in any 10.x or 11.x maintenance release yet.
Although we fixed this particular case, please note that both Saxon and the JAXP spec say that a supplied DOMSource must be namespace-aware. It's very hard to make sense of a DOM that isn't namespace aware, especially if nodes are constructed using a mixture of namespace-aware and non-namespace-aware APIs. (Really, it's high time that DOM was ditched completely...)
Updated by Michael Kay almost 2 years ago
Turns out that I wasn't even running Saxon, I was running Xalan. The test does TransformerFactory.newInstance()
and in my configuration this loads Xalan (How I hate JAXP!). Start again.
Updated by Michael Kay almost 2 years ago
Changing the test to explicitly load Saxon using new SaxonTransformerFactory()
, and using Saxon 12.0, I now get a crash on an internal assertion:
java.lang.IllegalStateException: Prefix xmime has not been declared for attribute xmime:contentType
at net.sf.saxon.event.RegularSequenceChecker.startElement(RegularSequenceChecker.java:372)
at net.sf.saxon.event.ComplexContentOutputter.startElement(ComplexContentOutputter.java:535)
at net.sf.saxon.tree.util.Navigator.copy(Navigator.java:678)
This basically indicates that we are trying to output an attribute node whose name is namespaced, when the element node does not contain a namespace declaration. This is a pretty typical scenario for a non-namespace aware DOM.
Setting the DOM to be namespace-aware doesn't actually solve the problem (that only affects XML parsing, as far as I can tell, not programmatic tree building), nor does creating the element using createElementNS()
make any difference. The essential problem is that using the DOM method element.setAttributeNS()
to create a namespaced attribute doesn't actually put any namespace declarations in the DOM tree: we're being asked to process an inconsistent input tree.
Updated by Michael Kay almost 2 years ago
We're copying the DOM element using Navigator.copy()
, which calls ComplexContentOutputter.startElement(NodeName elemName, SchemaType type, AttributeMap attributes, NamespaceMap namespaces, Location location, int properties)
. The Javadoc for this method states explicitly:
This version of the method does not perform namespace fixup for prefixes used in the element
* name or attribute names; it is assumed that these prefixes are declared within the namespace map,
* and that there are no conflicts
which is patently not the case when copying from an ill-formed DOM tree.
So we'll have to look at implementing DOMNodeWrapper.copy()
using a slower method that checks all the attributes and performs namespace fixup on them.
Alternatively, Navigator.copy()
calls getAllNamespaces()
to get all the in scope namespaces for an element, and that's doing a search up the DOM ancestor tree for attributes named "xmlns". This is already expensive enough, but we could add a check for undeclared namespaces into this search.
Updated by Michael Kay almost 2 years ago
There's an additional complication here which is the use of <xsl:copy-of copy-namespaces="no"/>
which says that we should only copy across the namespaces that are actually used, and not the namespace declarations (which in this case, don't actually exist). But it's currently failing even with copy-namespaces="yes"
.
At present I can't quite see how the Navigator.copy()
method is supposed to work with copy-namespaces="no"
. This path is used by nearly all tree models with the exception of the TinyTree. I'm going to create a unit test that applies to the LinkedTree to see what happens there.
Updated by Michael Kay almost 2 years ago
I wrote a new unit test to test copy-namespaces="no" across all the supported tree models and found that it's a problem for any tree model using the generic implementation of Navigator.copy()
. Tracking this separately as bug #5861.
Updated by Michael Kay almost 2 years ago
The generic bug #5861 is now fixed, but this test is still failing. That's because the programmatically-constructed DOM has a separate problem: getAllNamespaces()
on a wrapped DOM element node is only getting the namespaces that are actually represented by xmlns attribute nodes on the DOM tree.
Fixing this has potentially serious performance problems for all DOM users. Perhaps we should consider doing it by catching the exception and then marking the tree as namespace-ill-formed for future reference? Or we could consider not fixing it at all, because we know that handling all possible variations of programmatically-constructed DOM trees is a non-starter.
Or perhaps we shouldn't worry about the performance impact? If people cared about performance, they wouldn't be using DOM. (OK, that's not true, most of them have no idea how bad DOM performance is.)
Updated by Michael Kay almost 2 years ago
- Category set to DOM Interface
- Status changed from In Progress to Resolved
- Assignee set to Michael Kay
- Applies to branch 12, trunk added
- Fix Committed on Branch 10, 11, 12, trunk added
I decided to bite the bullet: getAllNamespaces() on a wrapped DOM node now looks not only for xmlns[:x] =uri
attributes, but also checks the namespace bindings in namespace aware ("DOM level 2") element and attribute nodes.
Although the change is made (and tested) primarily for programmatically-constructed DOM trees, it should also make Saxon more robust for cases where it is presented with a DOM constructed by XML parsing without namespace awareness -- however, this option remains officially unsupported.
I added a unit test DOMTest.testBug5859
.
I have not made any changes to the DOM wrapper code in SaxonCS.
Updated by Michael Kay almost 2 years ago
- Subject changed from Qualified namespaces are not copied in StreamResult to Handling of undeclared namespaces in a programmatically constructed DOM
Updated by O'Neil Delpratt almost 2 years ago
- % Done changed from 0 to 100
- Fixed in Maintenance Release 11.5 added
Bug fix applied in the Saxon 11.5 maintenance release.
Updated by O'Neil Delpratt almost 2 years ago
- Fixed in Maintenance Release 10.9 added
Bug fix applied in the Saxon 10.9 maintenance release. Leaving bug issue marked as resolved until applied in the next Saxon 12 maintenance.release.
Updated by O'Neil Delpratt over 1 year ago
- Status changed from Resolved to Closed
- Fixed in Maintenance Release 12.1 added
Bug fix applied in the Saxon 12.1 maintenance release.
Please register to edit this issue