Project

Profile

Help

Bug #4509

Saxon 10 DOM builder creates redundant namespace attributes

Added by Gerben Abbink 4 months ago. Updated 3 months ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
DOM Interface
Sprint/Milestone:
-
Start date:
2020-03-31
Due date:
% Done:

100%

Estimated time:
Legacy ID:
Applies to branch:
10
Fix Committed on Branch:
10
Fixed in Maintenance Release:

Description

I have this XML:

<root xmlns="namespace">
	<element/>
	<element/>
</root>

I build a DOM using net.sf.saxon.s9api.DocumentBuilder.

In the resulting DOM each "element" node has a "namespace" attribute.

In version 9 this was not the case, only the root had a "namespace" atribute.

Is this change by design or is it a bug?
TEST.java (2.76 KB) TEST.java Gerben Abbink, 2020-04-01 16:52

History

#1 Updated by Michael Kay 4 months ago

  • Subject changed from Saxon 10 returns to many namespace nodes to Saxon 10 DOM builder creates redundant namespace attributes
  • Category set to DOM Interface
  • Assignee set to Michael Kay
  • Priority changed from Low to Normal

Thanks. looks like we need to add a namespace deduplicator to this pipeline.

(Saxon 10 changes so that the Receiver pipeline now handles namespaces natively in a way much closer to the XDM model, where all in-scope namespaces are attached to every element, and elimination of redundant namespaces happens only when we deliver the events to a destination such as a serializer.)

#2 Updated by Michael Kay 4 months ago

  • Status changed from New to In Progress

I've written a JUnit test as follows, and it passes:

public void testBug4509() {
        try {
            Processor proc = new Processor(false);
            net.sf.saxon.s9api.DocumentBuilder builder = proc.newDocumentBuilder();
            builder.setTreeModel(DOMObjectModel.getInstance());
            XdmNode tree = builder.build(new StreamSource(new StringReader("<root xmlns='namespace'><a/><b/></root>")));
            Document doc = (Document)((DOMNodeWrapper)tree.getUnderlyingNode()).getUnderlyingNode();
            Element root = doc.getDocumentElement();
            assertTrue(root.hasAttribute("xmlns"));
            Element a = (Element)root.getChildNodes().item(0);
            assertFalse(a.hasAttribute("xmlns"));
            Element b = (Element) root.getChildNodes().item(1);
            assertFalse(b.hasAttribute("xmlns"));
        } catch (SaxonApiException err) {
            fail(err.getMessage());
        }
    }

Presumably you are doing something slightly different. Could you provide a repro please?

#3 Updated by Gerben Abbink 4 months ago

In my code hasAttribute("xmlns") also returns false
but getAttributes().item(0).getNodeName() actually returns "xmlns".

On Tue, Mar 31, 2020 at 11:25 AM Saxonica Developer Community <
> wrote:

#4 Updated by Michael Kay 4 months ago

The wonders of DOM.

I'm seeing a.getAttributes().getLength() == 0, so a.getAttributes().itemAt(0).getNodeName() throws an NPE.

I've also checked in the debugger and on this path we're not adding any attributes to the DOM element.

I'm afraid I'm not going to be able to make any progress on this unless you can provide precise code that reproduces the problem.

#5 Updated by Gerben Abbink 4 months ago

I do not call builder.setTreeModel(DOMObjectModel.getInstance()). The tree
model i use is net.sf.saxon.om.TreeModel$TinyTree.

On Tue, Mar 31, 2020 at 6:22 PM Saxonica Developer Community <
> wrote:

#6 Updated by Michael Kay 4 months ago

I really need to see some code from you.

If you're constructing a TinyTree rather than a DOM, then how can you call getAttributes().item(0).getNodeName()?

I'm afraid if you can't supply some code that I can run to reproduce the problem, I'm going to have to close this as unresolved.

#7 Updated by Gerben Abbink 4 months ago

I made a test program, see attachment. I am indeed calling getAttributes().item(i).getNodeName(), is that not allowed when using a TinyTree?

java -cp 'SaxonHE9-9-17J\saxon9he.jar';. TEST ACTUAL OUTPUT ON MY PC element0 element0 attribute[0] xmlns:xml=http://www.w3.org/XML/1998/namespace element0 attribute[1] def=200 element1 element1 attribute[0] xmlns:xml=http://www.w3.org/XML/1998/namespace element1 attribute[1] ghi=300 treeModel=net.sf.saxon.om.TreeModel$TinyTree

java -cp 'SaxonHE10-0J\saxon-he-10.0.jar';. TEST ACTUAL OUTPUT ON MY PC element0 element0 attribute[0] xmlns=namespace element0 attribute[1] def=200 element1 element1 attribute[0] xmlns=namespace element1 attribute[1] ghi=300 treeModel=net.sf.saxon.om.TreeModel$TinyTree

#8 Updated by Michael Kay 4 months ago

OK, thanks this makes it clear you are using the NodeOverNodeInfo mechanism which wraps DOM interfaces around a Saxon TinyTree. I can now finally see what's going on.

#9 Updated by Michael Kay 4 months ago

It's a long time since we looked at the NodeOverNodeInfo mechanism, and it seems to have some serious deficiencies. The Javadoc notes correctly that methods like getAttribute() and hasAttribute() don't treat namespaces as attributes, but the getAttributes() method does. This seems rather dysfunctional; since the only point of providing NodeOverNodeInfo is to allow people to treat the tree as if it were a DOM, providing methods that conform to the syntax of the DOM interface without implementing its semantics seems to be inviting trouble.

This isn't directly related to the present bug, which is caused by a lazy implementation of getAttributes() that exposes all in-scope namespaces as attributes, rather than exposing only those which differ from the namespaces that are in scope for the parent element.

#10 Updated by Michael Kay 4 months ago

  • Status changed from In Progress to Resolved
  • Fix Committed on Branch 10 added

I have fixed this (and more), applying the following changes:

  • The DOMAttributeMap constructed by ElementOverNodeInfo.getAttributes() now contains the delta of the in-scope namespaces for the element and those for its parent element, rather than containing all the in-scope namespaces as before.

  • An error in NamespaceMap.getDifferences() that led to redundant namespaces not being eliminated has been fixed.

  • NamespaceMap.getDifferences() now has a parameter to indicate whether XML1.1-style namespace undeclarations should be included in the result

  • Methods such as getAttribute(), getAttributeNS(), hasAttribute(), and hasAttributeNS() on ElementOverNodeInfo now follow the DOM conventions where namespaces are treated as ordinary attributes in a special namespace.

#11 Updated by O'Neil Delpratt 3 months ago

  • % Done changed from 0 to 100
  • Fixed in Maintenance Release 10.1 added

Bug fix committed in the Saxon 10.1 maintenance release.

#12 Updated by O'Neil Delpratt 3 months ago

  • Status changed from Resolved to Closed

Please register to edit this issue

Also available in: Atom PDF