Bug #5697


Serializing Node to String hangs

Added by andré mooiweer almost 2 years ago. Updated over 1 year ago.

s9api API
Start date:
Due date:
% Done:


Estimated time:
Legacy ID:
Applies to branch:
Fix Committed on Branch:
Fixed in Maintenance Release:


We are using Saxon-EE on Java and we have a problem serialize an Node to a String. A transformation on the node is done in 2 minutes. After that, serializing the Node to a String hangs the system. Her a section of the stacktrace at the moment: There is 1 thread in the server that may be hung. at java.lang.String.getCharsNoBoundChecks( at java.lang.StringBuffer.append( at at net.sf.saxon.serialize.XMLEmitter.startElement( at net.sf.saxon.serialize.XMLIndenter.startElement( at net.sf.saxon.event.ProxyReceiver.startElement( at net.sf.saxon.event.SequenceNormalizer.startElement( at net.sf.saxon.tree.tiny.TinyElementImpl.copy( at net.sf.saxon.event.SequenceReceiver.decompose( at net.sf.saxon.event.SequenceNormalizerWithSpaceSeparator.append( at net.sf.saxon.event.SequenceReceiver.lambda$decompose$1( at net.sf.saxon.event.SequenceReceiver$$Lambda$614/0x0000000006cd0710.accept(Unknown Source) at at net.sf.saxon.event.SequenceReceiver.decompose( at net.sf.saxon.event.SequenceNormalizerWithSpaceSeparator.append( at net.sf.saxon.event.ProxyReceiver.append( at net.sf.saxon.event.SequenceReceiver.append( at net.sf.saxon.event.SequenceCopier$$Lambda$613/0x000000007a156fd0.accept(Unknown Source) at net.sf.saxon.tree.iter.SingletonIterator.forEachOrFail( at net.sf.saxon.event.SequenceCopier.copySequence( at net.sf.saxon.query.QueryResult.serializeSequence( at net.sf.saxon.query.QueryResult.serialize( at net.sf.saxon.s9api.Serializer.serializeNodeToResult( at net.sf.saxon.s9api.Serializer.serializeNodeToString( Probably the output is simply too big, but have you got any suggestions on how to tackle this issue?

Actions #1

Updated by Michael Kay almost 2 years ago

How large is the serialized output? Does it work if you serialize to a file or to standard output? Is there memory pressure? Have you looked at a heap dump at the point where it hangs?

A StringBuffer uses an underlying char[] array so the data requires contiguous memory, I can imagine memory pressure causing difficulty in expanding the array.

Actions #2

Updated by Michael Kay almost 2 years ago

  • Status changed from New to AwaitingInfo
Actions #3

Updated by andré mooiweer almost 2 years ago

The size of the xml before the transformation is 915.310.643 utf-8 chars (we did manage to write this the our database using the serializeNodeToString). When I run xdmNode.getStringValue().length() on this xml I get 147.421.083. After the transformation I still get 147.421.083 from xdmNode.getStringValue().length(), this is a bit strange because the transformations adds attributes to certain xml-tags which should increase the size. Anyway, when performing the serializeNodeToString on this transformed xml (still 915.310.643 chars, or more?) the program hangs ... I also tried calling the garbage colletion before the serializeNodeToString, no success.

I'm considering other ways of storing the xml in our database

  • one option would be to split the xml into 3 parts (the xml contains 3 kind of elements) and then store these seperately and later on merge them again when needed

  • another option would be to compress (zip) the xml before storing it (but probably it first has to be transformed to a string before it can be zipped)

  • may it be possible to store the xdmNode as a blob in the database ? Any suggestions are welcome.

  • we are working into access to our heap-dump (preparing visual vm)

  • and we are also considering serializing the xml to file; is there a saxon method/function to do this?

Actions #4

Updated by Michael Kay almost 2 years ago

A single tree of this size is certainly going to put a big strain on the system. The TinyTree allocates large contiguous arrays whose size is proportional to tree size, so it will need contiguous memory from Java heap allocator, which is asking a lot. Using the LinkedTree would need more memory in total, but it will be allocated as many small fragments, which is less likely to cause problems.

Note that the string value of an element node in XDM represents the concatenation of the text node descendants, which ignores any space occupied by attributes. Simply constructing the string value in order to determine its size could require a lot of space allocation (the way we do this changes significantly in Saxon 11) - doing sum(//text()!string-length()) is probably much cheaper.

Really, you should be trying to find a way to tackle this using XSLT 3.0 streaming. I don't know what your workload is actually doing with the XML, so streaming might or might not be possible, but a streamed solution would be much more manageable at this kind of data size.

Actions #5

Updated by andré mooiweer almost 2 years ago

-short update- We changed the treemodel for this xml to TINY_TREE_CONDENSED and after the transformation the serializeNodeToString took only 7 seconds. The size of the resulting xml was 1.198.309.034 chars. We wil run some more regression tests tomorrow and will also try the LINKED_TREE. -so far, so good-

Actions #6

Updated by Michael Kay over 1 year ago

  • Status changed from AwaitingInfo to Closed

Closing this now.

Please register to edit this issue

Also available in: Atom PDF