Bug #5838
closed

ArrayIndexOutOfBounds with whitespace coming from StaxSource
100%
Description
See https://saxonica.plan.io/boards/3/topics/9228?pn=1
The stacktrace starts
java.lang.ArrayIndexOutOfBoundsException: Index 8192 out of bounds for length 8192
at net.sf.saxon.str.StringTool.compress(StringTool.java:267) ~[Saxon-HE-11.4.jar:na]
at net.sf.saxon.str.CompressedWhitespace.compressWS(CompressedWhitespace.java:52) ~[Saxon-HE-11.4.jar:na]
at net.sf.saxon.str.StringTool.compress(StringTool.java:277) ~[Saxon-HE-11.4.jar:na]
at net.sf.saxon.pull.StaxBridge.getStringValue(StaxBridge.java:456) ~[Saxon-HE-11.4.jar:na]
Updated by Michael Kay 11 months ago
The cause looks pretty clear from the stack trace: StringTool#277 is calling CompressedWhitespace.compressWS(in, offset, end);
but the method declaration is public static UnicodeString compressWS(char[] in, int start, int len)
so we're clearly passing an "end" offset where a length is expected.
Which obviously raises questions like (a) why is it only happening on this path, and (b) how can we write a regression test to catch the failure and make sure it's fixed.
(This confusion between end offset and length is a real elephant trap. Both conventions are widely used in Java APIs).
Updated by Michael Kay 11 months ago
The reason the problem doesn't arise when a push parser is used is that on that path we only compress whitespace after assembling multiple text fragments in a buffer, and as a result it always calls compress()
with a start offset of 0, in which case end
and length
are interchangeable.
As for testing, we only run a rather small number of tests with a pull parser. I'll take a close look at some of those tests to see what path they are taking.
Updated by Michael Kay 11 months ago
The tests we are running use Woodstox, which does indeed supply text in a 4000-character buffer -- but like the SAX parser, it's always returning 0 as the result of reader.getTextStart()
, which means we're again setting start=0 and therefore not seeing any distinction between end
and length
.
I think the best way of tackling this is going to be to write a unit test that invokes the compression methods directly, rather than trying to trigger an XML parser to set the right input conditions.
Updated by Michael Kay 11 months ago
- Status changed from New to Resolved
- Fix Committed on Branch 11, 12, trunk added
- Platforms .NET added
Added unit tests in internaltests/CompressionTests
.
Bug fixed.
Updated by O'Neil Delpratt 10 months ago
- % Done changed from 0 to 100
- Fixed in Maintenance Release 11.5 added
Bug fix applied in the Saxon 11.5 maintenance release.
Updated by O'Neil Delpratt 9 months ago
- Status changed from Resolved to Closed
- Fixed in Maintenance Release 12.1 added
Bug fix applied in the Saxon 12.1 maintenance release.
Please register to edit this issue