Actions
Bug #6044
open

More than 1G of text in a TinyTree
Start date:
2023-05-23
Due date:
% Done:
0%
Estimated time:
Legacy ID:
Applies to branch:
Fix Committed on Branch:
Fixed in Maintenance Release:
Platforms:
.NET, Java
Description
In 11.0 we lifted limits so the LargeTextBuffer, used to hold text nodes in the TinyTree, can expand beyond 2^32 characters. However, the offsets in the buffer (in the alpha and beta arrays) are still 32-bit ints, so this doesn't achieve much. In fact, all it seems to achieve is that we no longer fail cleanly when the limit is exceeded.
Tree size: 2397877 nodes, -1857909795 characters, 298610 attributes
java.lang.ArrayIndexOutOfBoundsException: -32768
at java.util.ArrayList.elementData(ArrayList.java:424)
at java.util.ArrayList.get(ArrayList.java:437)
at net.sf.saxon.str.LargeTextBuffer.getSegment(LargeTextBuffer.java:255)
at net.sf.saxon.str.LargeTextBuffer.substring(LargeTextBuffer.java:377)
at net.sf.saxon.tree.tiny.TinyTextImpl.getStringValue(TinyTextImpl.java:50)
A possible design to increase the capacity without penalty for "ordinary" users might be for the alpha and beta entries to hold pointers into the "current region" of the text buffer, with a separate index holding a mapping from ranges of node numbers to regions.
No data to display
Please register to edit this issue
Actions