Project

Profile

Help

Bug #1795

closed

Increased memory requirements of tree model in Saxon 9.4 compared to Saxon 9.3

Added by Stuart Barker almost 11 years ago. Updated almost 11 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
Performance
Sprint/Milestone:
-
Start date:
2013-06-07
Due date:
% Done:

100%

Estimated time:
Legacy ID:
Applies to branch:
Fix Committed on Branch:
Fixed in Maintenance Release:
Platforms:

Description

Our domain (XBRL) requires reasonable performance with potentially large models and some of our testing uncovered significant additional memory demands of Saxon 9.4 onwards over Saxon 9.3. In particular, for an instance document of raw size 36Mb the additional memory consumed by the TinyTree was over 100Mb (from c40Mb to c150Mb) representing a significant increase in total memory use for our application.

I attach an extract from a heap snapshot comparison which shows that it is the introduction of NamespaceBindings (and their associated Strings) into the TinyTree that is creating the additional memory demand (my understanding is that, prior to 9.4, an int array referencing into a name pool was used). This comparison was generated using the Java 7 JRE, but the results are comparable using Java 6 JRE.

The instance document being processed was relatively flat but large, with several 100,000s of elements and about 100 namespace prefix declarations in the root element. Nevertheless, the very large number of NamespaceBinding objects suggests many of them are presumably equivalent.


Files

saxon93vs95_heapExtract.html (8.77 KB) saxon93vs95_heapExtract.html Stuart Barker, 2013-06-07 16:24
Actions #1

Updated by Michael Kay almost 11 years ago

  • Category set to Performance
  • Status changed from New to In Progress
  • Assignee set to Michael Kay
  • Found in version set to 9.4

Many thanks for drawing our attention to this problem, which appears to have gone unnoticed for a long time. If I recall correctly, the theory is that NamespaceBinding objects should be pooled and shared, so they should not cause a significant space overhead. We will examine your use case to see why this is not happening.

Actions #2

Updated by Michael Kay almost 11 years ago

I think we will probably need to see in more detail what your source document looks like - hopefully a cut down version that illustrates the problem. In particular, I would like to see where the namespace declarations occur and whether there are any namespace undeclarations. It might also be useful to know how the TinyTree was built.

Actions #3

Updated by Stuart Barker almost 11 years ago

I am unable to give you the actual source document that we were using when we reported this bug, for commercial reasons.

However, I can confirm that it has a little over 100 namespace declarations and they are all in the root element. There are no other namespace declarations or undeclarations anywhere in the document.

The root element of the document has roughly 50000 child elements, half of which have text only content (and typically 3 attributes) and the other half of which have element content (with typically 2 or 3 direct children, nesting to a depth of 2 or 3 elements with leaf elements having text content; all these elements have one attribute at most).

The TinyTree was built by calling buildDocument() on the configuration, passing a DOMSource (in the form of an AugmentedSource with validation mode 'LAX').

Hopefully this helps, but if you can't reproduce our findings with a similarly structured document then we will attempt to provide a public document that exhibits the same behaviour.

Actions #4

Updated by Michael Kay almost 11 years ago

Thanks. I think the critical information here is that you built the TinyTree by copying a DOMSource, rather than by direct parsing of the source XML. I will attempt to reproduce that scenario.

Actions #5

Updated by Michael Kay almost 11 years ago

I have confirmed that when a TinyTree is written by copying from a DOMSource, each element node has two unnecessary NamespaceBindings for its default namespace.

Actions #6

Updated by Michael Kay almost 11 years ago

The best fix for this seems to be to add a NamespaceReducer to the pipeline constructed by buildDocument(), immediately before the final Builder. A NamespaceReducer eliminates redundant namespace bindings passed down the event stream. It is hard to avoid generating the redundant namespace bindings when the source is a DOM, because there are many different ways namespace information is represented in a DOM and Saxon has to look for it in more than one way.

I also found that there is code in StartTagBuffer that is attempting to remove redundant namespaces which compares NamespaceBindings using "==" rather than equals().

A patch for both these will be committed for 9.4, 9.5 and 9.6.

Actions #7

Updated by Michael Kay almost 11 years ago

  • Status changed from In Progress to Resolved
  • Found in version changed from 9.4 to 9.4 9.5
Actions #8

Updated by O'Neil Delpratt almost 11 years ago

  • Status changed from Resolved to Closed
  • % Done changed from 0 to 100
  • Fixed in version set to 9.4.0.8 9.5.1.1

Bug now closed. Successfully applied to the Saxon maintenance releases 9.4.0.8 and 9.5.1.1

Please register to edit this issue

Also available in: Atom PDF