Bug #5697
closed
Serializing Node to String hangs
Fixed in Maintenance Release:
Description
We are using Saxon-EE 9.9.1.8 on Java and we have a problem serialize an Node to a String.
A transformation on the node is done in 2 minutes. After that, serializing the Node to a String hangs the system.
Her a section of the stacktrace at the moment:
There is 1 thread in the server that may be hung.
at java.lang.String.getCharsNoBoundChecks(String.java:2010)
at java.lang.StringBuffer.append(StringBuffer.java:591)
at java.io.StringWriter.write(StringWriter.java:112)
at net.sf.saxon.serialize.XMLEmitter.startElement(XMLEmitter.java:405)
at net.sf.saxon.serialize.XMLIndenter.startElement(XMLIndenter.java:119)
at net.sf.saxon.event.ProxyReceiver.startElement(ProxyReceiver.java:132)
at net.sf.saxon.event.SequenceNormalizer.startElement(SequenceNormalizer.java:88)
at net.sf.saxon.tree.tiny.TinyElementImpl.copy(TinyElementImpl.java:358)
at net.sf.saxon.event.SequenceReceiver.decompose(SequenceReceiver.java:203)
at net.sf.saxon.event.SequenceNormalizerWithSpaceSeparator.append(SequenceNormalizerWithSpaceSeparator.java:41)
at net.sf.saxon.event.SequenceReceiver.lambda$decompose$1(SequenceReceiver.java:186)
at net.sf.saxon.event.SequenceReceiver$$Lambda$614/0x0000000006cd0710.accept(Unknown Source)
at net.sf.saxon.om.SequenceIterator.forEachOrFail(SequenceIterator.java:128)
at net.sf.saxon.event.SequenceReceiver.decompose(SequenceReceiver.java:185)
at net.sf.saxon.event.SequenceNormalizerWithSpaceSeparator.append(SequenceNormalizerWithSpaceSeparator.java:41)
at net.sf.saxon.event.ProxyReceiver.append(ProxyReceiver.java:234)
at net.sf.saxon.event.SequenceReceiver.append(SequenceReceiver.java:130)
at net.sf.saxon.event.SequenceCopier$$Lambda$613/0x000000007a156fd0.accept(Unknown Source)
at net.sf.saxon.tree.iter.SingletonIterator.forEachOrFail(SingletonIterator.java:152)
at net.sf.saxon.event.SequenceCopier.copySequence(SequenceCopier.java:34)
at net.sf.saxon.query.QueryResult.serializeSequence(QueryResult.java:202)
at net.sf.saxon.query.QueryResult.serialize(QueryResult.java:118)
at net.sf.saxon.s9api.Serializer.serializeNodeToResult(Serializer.java:624)
at net.sf.saxon.s9api.Serializer.serializeNodeToString(Serializer.java:618)
Probably the output is simply too big, but have you got any suggestions on how to tackle this issue?
How large is the serialized output? Does it work if you serialize to a file or to standard output? Is there memory pressure? Have you looked at a heap dump at the point where it hangs?
A StringBuffer uses an underlying char[] array so the data requires contiguous memory, I can imagine memory pressure causing difficulty in expanding the array.
- Status changed from New to AwaitingInfo
The size of the xml before the transformation is 915.310.643 utf-8 chars (we did manage to write this the our database using the serializeNodeToString).
When I run xdmNode.getStringValue().length() on this xml I get 147.421.083.
After the transformation I still get 147.421.083 from xdmNode.getStringValue().length(), this is a bit strange because the transformations adds attributes to certain xml-tags which should increase the size.
Anyway, when performing the serializeNodeToString on this transformed xml (still 915.310.643 chars, or more?) the program hangs ...
I also tried calling the garbage colletion before the serializeNodeToString, no success.
I'm considering other ways of storing the xml in our database
-
one option would be to split the xml into 3 parts (the xml contains 3 kind of elements) and then store these seperately and later on merge them again when needed
-
another option would be to compress (zip) the xml before storing it (but probably it first has to be transformed to a string before it can be zipped)
-
may it be possible to store the xdmNode as a blob in the database ?
Any suggestions are welcome.
-
we are working into access to our heap-dump (preparing visual vm)
-
and we are also considering serializing the xml to file; is there a saxon method/function to do this?
A single tree of this size is certainly going to put a big strain on the system. The TinyTree allocates large contiguous arrays whose size is proportional to tree size, so it will need contiguous memory from Java heap allocator, which is asking a lot. Using the LinkedTree would need more memory in total, but it will be allocated as many small fragments, which is less likely to cause problems.
Note that the string value of an element node in XDM represents the concatenation of the text node descendants, which ignores any space occupied by attributes. Simply constructing the string value in order to determine its size could require a lot of space allocation (the way we do this changes significantly in Saxon 11) - doing sum(//text()!string-length())
is probably much cheaper.
Really, you should be trying to find a way to tackle this using XSLT 3.0 streaming. I don't know what your workload is actually doing with the XML, so streaming might or might not be possible, but a streamed solution would be much more manageable at this kind of data size.
-short update-
We changed the treemodel for this xml to TINY_TREE_CONDENSED and after the transformation the serializeNodeToString took only 7 seconds.
The size of the resulting xml was 1.198.309.034 chars.
We wil run some more regression tests tomorrow and will also try the LINKED_TREE.
-so far, so good-
- Status changed from AwaitingInfo to Closed
Please register to edit this issue
Also available in: Atom
PDF