Bug #5838
closedArrayIndexOutOfBounds with whitespace coming from StaxSource
100%
Description
See https://saxonica.plan.io/boards/3/topics/9228?pn=1
The stacktrace starts
java.lang.ArrayIndexOutOfBoundsException: Index 8192 out of bounds for length 8192
at net.sf.saxon.str.StringTool.compress(StringTool.java:267) ~[Saxon-HE-11.4.jar:na]
at net.sf.saxon.str.CompressedWhitespace.compressWS(CompressedWhitespace.java:52) ~[Saxon-HE-11.4.jar:na]
at net.sf.saxon.str.StringTool.compress(StringTool.java:277) ~[Saxon-HE-11.4.jar:na]
at net.sf.saxon.pull.StaxBridge.getStringValue(StaxBridge.java:456) ~[Saxon-HE-11.4.jar:na]
Updated by Michael Kay almost 2 years ago
The cause looks pretty clear from the stack trace: StringTool#277 is calling CompressedWhitespace.compressWS(in, offset, end);
but the method declaration is public static UnicodeString compressWS(char[] in, int start, int len)
so we're clearly passing an "end" offset where a length is expected.
Which obviously raises questions like (a) why is it only happening on this path, and (b) how can we write a regression test to catch the failure and make sure it's fixed.
(This confusion between end offset and length is a real elephant trap. Both conventions are widely used in Java APIs).
Updated by Michael Kay almost 2 years ago
The reason the problem doesn't arise when a push parser is used is that on that path we only compress whitespace after assembling multiple text fragments in a buffer, and as a result it always calls compress()
with a start offset of 0, in which case end
and length
are interchangeable.
As for testing, we only run a rather small number of tests with a pull parser. I'll take a close look at some of those tests to see what path they are taking.
Updated by Michael Kay almost 2 years ago
The tests we are running use Woodstox, which does indeed supply text in a 4000-character buffer -- but like the SAX parser, it's always returning 0 as the result of reader.getTextStart()
, which means we're again setting start=0 and therefore not seeing any distinction between end
and length
.
I think the best way of tackling this is going to be to write a unit test that invokes the compression methods directly, rather than trying to trigger an XML parser to set the right input conditions.
Updated by Michael Kay almost 2 years ago
- Status changed from New to Resolved
- Fix Committed on Branch 11, 12, trunk added
- Platforms .NET added
Added unit tests in internaltests/CompressionTests
.
Bug fixed.
Updated by O'Neil Delpratt almost 2 years ago
- % Done changed from 0 to 100
- Fixed in Maintenance Release 11.5 added
Bug fix applied in the Saxon 11.5 maintenance release.
Updated by O'Neil Delpratt almost 2 years ago
- Status changed from Resolved to Closed
- Fixed in Maintenance Release 12.1 added
Bug fix applied in the Saxon 12.1 maintenance release.
Please register to edit this issue