Bug #1923
closedTinyTree can't handle >1Gb text size
0%
Description
Reported by Vadim Zalunin on saxon-help mailing list:
Just downloaded fresh saxon9he jar and it crashes on a large xml:
Warning: at xsl:stylesheet on line 2 column 80 of example.xslt:
Running an XSLT 1 stylesheet with an XSLT 2 processor
java.lang.ArrayIndexOutOfBoundsException: -32632
at
net.sf.saxon.tree.tiny.LargeStringBuffer.append(LargeStringBuffer.java:89)
at net.sf.saxon.tree.tiny.TinyTree.appendChars(TinyTree.java:415)
The xml file is ~3gb long and some values inside may be of up to 1gb long.
Updated by Michael Kay over 10 years ago
I don't think there's a realistic prospect of handling in-memory documents where the string value of any node is greater than Java's limits on the capacity of a String. The 2Gb limit is too firmly entrenched in the Java infrastructure and in the APIs we use for accessing the tree.
The best we could do would be to make the failure softer.
For example we could try to ensure that LargeStringBuffer.append() allows the size to go beyond 2Gb, but that wouldn't be useful on its own, because the class implements Java's CharSequence which has methods such as charAt() and subSequence() that use int lengths and offsets. In addition there is simply the problem of testing it.
Moving from 32bit offsets in the TinyTree arrays to 64bit offsets would affect memory usage for everyone; we would need to introduce a new tree model for this.
In addition, I'm not convinced that adding extra checks to get a cleaner error message is worth the trouble. The small number of people who hit this problem will know that they are stretching the boundaries, so the mode of failure doesn't really matter too much.
Updated by Michael Kay over 10 years ago
- Status changed from New to Won't fix
Decided there's no realistic prospect of raising this limit without help from the Java infrastructure.
Please register to edit this issue