Project

Profile

Help

Bug #2431

closed

Saxon uses more memory than Xalan to transform W3C DOM to String

Added by Mateusz Nowakowski over 8 years ago. Updated over 8 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
Performance
Sprint/Milestone:
-
Start date:
2015-07-31
Due date:
% Done:

100%

Estimated time:
Legacy ID:
Applies to branch:
Fix Committed on Branch:
Fixed in Maintenance Release:
Platforms:

Description

Depending on input XML size Saxon allocates between 48-75% more memory to convert W3C Document to String.

Difference between case 1 and 2:

{code}

  1. W3C DOM -> String using Saxon allocated memory: 219432 bytes

  2. W3C DOM -> String using Xalan allocated memory: 145512 bytes

  3. Saxon NodeInfo -> String using Saxon allocated memory: 78448 bytes

  4. Wrapped W3C DOM from NodeInfo -> String using Saxon allocated memory: 78936 bytes

  5. Wrapped W3C DOM from NodeInfo -> String using Xalan allocated memory: 318392 bytes

{code}

See attached transformer-case.zip project (run mvn test to see the output, developed under Java7)

Fortunately Saxon is much better when Saxon DOM model is used, but problem is with libraries using Transformer API which starting behaving worse when Saxon appears on the classpath.


Files

transformer-case.zip (5.85 KB) transformer-case.zip Mateusz Nowakowski, 2015-07-31 14:06
Actions #1

Updated by Michael Kay over 8 years ago

  • Category changed from JAXP Java API to DOM Interface
  • Status changed from New to In Progress
  • Priority changed from High to Normal

The figures don't greatly surprise me and I don't think it's a bug. We made a conscious design decision to optimize Saxon for the Saxon TinyTree model and to support DOM only for compatibility, not for high performance.

Actions #2

Updated by Michael Kay over 8 years ago

Taking another look at this, I see that we're talking here about an "identity transformation" (@TransformerFactory.newTransformer()@) rather than a true XSLT transformation. It's not clear at first sight why this should use any significant amount of memory.

It could be that the memory is caused by Saxon's DOM walk using node.getChildNodes()@, and that @getFirstChild()/getNextSibling() would perform better. It's also possible that finding all the namespaces could be more efficient.

I believe Xalan has private knowledge of the internals of the Xerces DOM, and it's possible it may make use of this to achieve things that Saxon can never achieve using public (and portable) DOM APIs. For example, discovering all the in-scope namespaces on an Element node is fairly horrendous using only public APIs. But I'll take a look at it.

Actions #3

Updated by Michael Kay over 8 years ago

I'm seeing consistent figures: for an identity transformation of a 1Mb document

Saxon: 22ms in 22Mbytes

Xalan: 13ms in 14Mbytes

Note we're measuring allocated memory, not used memory: I think a lot of the memory is used very transiently.

Java profiling suggests that a lot of the time is going in String.intern(), which is invoked from org.xml.sax.helpers.NamespaceSupport - I'm a little surprised to find we are using this. The interning of strings could also be accounting for memory usage.

Actions #4

Updated by Michael Kay over 8 years ago

I've changed the way namespaces are managed by the DOM walker and the results are looking promising:

Saxon: 25ms in 8037184bytes

       Xalan: 13ms in 14599008bytes

Saxon: 15ms in 8844032bytes

       Xalan: 19ms in 14599008bytes

Saxon: 16ms in 7533856bytes

       Xalan: 13ms in 14599256bytes

Saxon: 10ms in 7533856bytes

       Xalan: 13ms in 14599008bytes

Saxon: 12ms in 7533856bytes

       Xalan: 14ms in 14599008bytes

Saxon: 11ms in 7533856bytes

       Xalan: 17ms in 14599008bytes

That is to say, the memory is now significantly better than Xalan, and the execution time is broadly comparable.

The test case I was using didn't actually have any namespaces.

I now need to do a lot of functional testing - ideally including programmatically built DOM trees which can display all sorts of oddities. Unfortunately there aren't as many unit tests of this area as I would like.

Actions #5

Updated by Michael Kay over 8 years ago

At present I'm having some difficulty reproducing the previous results for a non-namespace-aware DOM. It's not helped by the fact that (a) I have hardly any tests for that case, and (b) the spec is totally unclear on what should happen. The spec for DOMSource says that XSLT requires the DOM to be namespace-aware, but it doesn't say that for the identity transform. It's also not helped by the fact that the DOM provides no interrogatives, so you can't actually ask whether it's namespace-aware or not. Everything to do with DOM is so frustrating!

Actions #6

Updated by Michael Kay over 8 years ago

  • Category changed from DOM Interface to Performance
  • Status changed from In Progress to Resolved
  • Assignee set to Michael Kay
  • % Done changed from 0 to 100
  • Found in version changed from SaxonHE 9.6.0-5 to 9.6
  • Fixed in version set to 9.7

Resolved for 9.7. There's one test whose behaviour appears to have changed, namely schema-validating a non-namespace-aware DOM, but as far as I can see the new result is correct. Since this is only a performance improvement and creates a significant risk of breaking existing applications (test coverage for programmatically-created DOMs is quite limited) -- and especially as some of these might be applications that aren't aware they are using Saxon, because of the JAXP loading mechanism -- I'm not going to retrofit it to the 9.6 branch.

Actions #7

Updated by O'Neil Delpratt over 8 years ago

  • Status changed from Resolved to Closed
  • Fixed in version changed from 9.7 to 9.6.0.8
Actions #8

Updated by O'Neil Delpratt over 8 years ago

  • Fixed in version deleted (9.6.0.8)

This fix went out in the 9.7 release only

Please register to edit this issue

Also available in: Atom PDF