Project

Profile

Help

Bug #6044

open

More than 1G of text in a TinyTree

Added by Michael Kay 13 days ago.

Status:
New
Priority:
Normal
Assignee:
Category:
Performance
Sprint/Milestone:
-
Start date:
2023-05-23
Due date:
% Done:

0%

Estimated time:
Legacy ID:
Applies to branch:
Fix Committed on Branch:
Fixed in Maintenance Release:
Platforms:
.NET, Java

Description

In 11.0 we lifted limits so the LargeTextBuffer, used to hold text nodes in the TinyTree, can expand beyond 2^32 characters. However, the offsets in the buffer (in the alpha and beta arrays) are still 32-bit ints, so this doesn't achieve much. In fact, all it seems to achieve is that we no longer fail cleanly when the limit is exceeded.

Tree size: 2397877 nodes, -1857909795 characters, 298610 attributes
java.lang.ArrayIndexOutOfBoundsException: -32768
        at java.util.ArrayList.elementData(ArrayList.java:424)
        at java.util.ArrayList.get(ArrayList.java:437)
        at net.sf.saxon.str.LargeTextBuffer.getSegment(LargeTextBuffer.java:255)
        at net.sf.saxon.str.LargeTextBuffer.substring(LargeTextBuffer.java:377)
        at net.sf.saxon.tree.tiny.TinyTextImpl.getStringValue(TinyTextImpl.java:50)

A possible design to increase the capacity without penalty for "ordinary" users might be for the alpha and beta entries to hold pointers into the "current region" of the text buffer, with a separate index holding a mapping from ranges of node numbers to regions.

No data to display

Please register to edit this issue

Also available in: Atom PDF