Project

Profile

Help

Bug #2368

closed

DOM4J wrapper.copy() has O(n^2) performance

Added by Michael Kay over 9 years ago. Updated almost 9 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
DOM4J Interface
Sprint/Milestone:
Start date:
2015-05-01
Due date:
% Done:

100%

Estimated time:
Legacy ID:
Applies to branch:
9.6
Fix Committed on Branch:
9.6
Fixed in Maintenance Release:
Platforms:

Description

Scenario: a document whose root element has 2M child elements; the document is built using DOM4J, then wrapped in a DOM4JDocumentWrapper, then buildDocument() is called to copy the data to a Saxon TinyTree.

This takes "for ever". The time is spent in Element.getDeclaredNamespaces(). Saxon uses Navigator.copy(), which calls getDeclaredNamespaces() every time it copies an element, In DOM4J, Element.declaredNamespaces searches the content array looking for namespaces, so its cost is proportional to the number of children. Then (the real killer), Saxon not only calls Element.declaredNamespaces() on the element being copied, but on each of its ancestors. So for this high-fanout document, the 2M child elements of the outermost element are scanned 2M times.

Possible solutions. (a) Change the generic Navigator.copy() routine so that when copying descendant elements, it only looks for their own namespaces, and not for those of the ancestor elements. (b) Provide a custom copy() method on DOM4JDocumentWrapper which invokes the DOM4J SAXWriter to walk the tree, rather than using the generic Navigator.copy(). (c) Change DOM4JNodeWrapper.getDeclaredNamespaces() to cache the result (or at least, remember that a particular element has no namespaces. (d) In the generic Navigator.copy, maintain a stack of namespaces to avoid going back to ancestor elements.

Please register to edit this issue

Also available in: Atom PDF