Project

Profile

Help

Bug #1923

closed

TinyTree can't handle >1Gb text size

Added by Michael Kay over 10 years ago. Updated over 10 years ago.

Status:
Won't fix
Priority:
Normal
Assignee:
Category:
Internals
Sprint/Milestone:
-
Start date:
2013-10-24
Due date:
% Done:

0%

Estimated time:
Legacy ID:
Applies to branch:
Fix Committed on Branch:
Fixed in Maintenance Release:
Platforms:

Description

Reported by Vadim Zalunin on saxon-help mailing list:

Just downloaded fresh saxon9he jar and it crashes on a large xml:

Warning: at xsl:stylesheet on line 2 column 80 of example.xslt:

Running an XSLT 1 stylesheet with an XSLT 2 processor

java.lang.ArrayIndexOutOfBoundsException: -32632

    at 

net.sf.saxon.tree.tiny.LargeStringBuffer.append(LargeStringBuffer.java:89)

    at net.sf.saxon.tree.tiny.TinyTree.appendChars(TinyTree.java:415)

The xml file is ~3gb long and some values inside may be of up to 1gb long.

Actions #1

Updated by Michael Kay over 10 years ago

I don't think there's a realistic prospect of handling in-memory documents where the string value of any node is greater than Java's limits on the capacity of a String. The 2Gb limit is too firmly entrenched in the Java infrastructure and in the APIs we use for accessing the tree.

The best we could do would be to make the failure softer.

For example we could try to ensure that LargeStringBuffer.append() allows the size to go beyond 2Gb, but that wouldn't be useful on its own, because the class implements Java's CharSequence which has methods such as charAt() and subSequence() that use int lengths and offsets. In addition there is simply the problem of testing it.

Moving from 32bit offsets in the TinyTree arrays to 64bit offsets would affect memory usage for everyone; we would need to introduce a new tree model for this.

In addition, I'm not convinced that adding extra checks to get a cleaner error message is worth the trouble. The small number of people who hit this problem will know that they are stretching the boundaries, so the mode of failure doesn't really matter too much.

Actions #2

Updated by Michael Kay over 10 years ago

  • Status changed from New to Won't fix

Decided there's no realistic prospect of raising this limit without help from the Java infrastructure.

Please register to edit this issue

Also available in: Atom PDF