Project

Profile

Help

Bug #4083

closed

Performance of shallow copy with namespaces

Added by Michael Kay over 5 years ago. Updated almost 4 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
-
Sprint/Milestone:
-
Start date:
2018-12-28
Due date:
% Done:

0%

Estimated time:
Applies to JS Branch:
Trunk
Fix Committed on JS Branch:
Fixed in JS Release:
SEF Generated with:
Platforms:
Company:
-
Contact person:
-
Additional contact persons:
-

Description

By default xsl:copy is required to copy all in-scope namespaces. Saxon-JS does this very literally.

Imagine you are using a recursive shallow copy to copy a tree with a depth of 10, in which 12 namespaces are declared on the root element. Every time an element is copied, we call DomUtils.inScopeNamespaces() to get a complete list of its namespaces (which involves examining every attribute of every ancestor to see whether it is actually a namespace declaration, and eliminating redundant declarations as we go); each non-redundant declaration is then added as an attribute node on the new copied element (without any check for redundancy). As a result every node in the result tree has 12 attributes representing namespaces. A second copy operation from this tree to a third tree will now have to examine 120 namespace attributes, eliminate most of them as redundant and copy the remainder to the third tree.

In Saxon/J we reduce these costs by passing a virtual namespaceBindings object which is very quickly constructed because it is actually just a wrapper for the element node; when adding these namespaceBindings to the new element we first check whether the namespaceBindings are the same as the bindings for the parent of the new node, in which case no action is needed.

Actions #1

Updated by Michael Kay over 5 years ago

In cases where we use our own DOM implementation (i.e. currently in NodeJS) we should probably consider representing namespaces on the tree in a different way.

Do we actually need the tree to conform to DOM interfaces in every respect? Or could we move to something that supports the DOM interfaces we actually need, but (for example) doesn't present namespaces as attributes?

We could consider every element node having an inScopeNamespaces property which returns a map from prefixes to URIs; this is not as extravagant as it seems, because in the vast majority of cases, it would be a reference to the same map as its parent element points to. Only when the namespace context of an element is different from its parent element would we construct a different map.

This would make it much faster (a) to get all the in-scope namespaces, and (b) to do prefix-to-uri resolution. The operation of shallow copy would then be essentially to merge the in-scope-namespaces of the source element with those of the parent of the target element, optimizing for the common case where these are the same map.

Actions #2

Updated by Michael Kay over 4 years ago

For 2.0 we have introduced push-mode evaluation of xsl:copy and other instructions, and the code now eliminates duplicate namespace declarations as it goes. This should alleviate the problems considerably.

There remains an issue that the implementation of in-scope-namespaces() is much less efficient than it could be. It uses "pure DOM" interfaces to access xmlns and xmlns:xxx attributes as attributes, whereas the xmldom implementation used on Node.js offers much faster (but non-standard) access to namespace information.

Actions #3

Updated by Michael Kay almost 4 years ago

  • Status changed from New to Closed

Closing this as we have made some progress on the area, and we have no evidence there is still a problem to be solved.

Please register to edit this issue

Also available in: Atom PDF Tracking page