Bug #5838
closed
ArrayIndexOutOfBounds with whitespace coming from StaxSource
Applies to branch:
11, 12, trunk
Fix Committed on Branch:
11, 12, trunk
Fixed in Maintenance Release:
Description
See https://saxonica.plan.io/boards/3/topics/9228?pn=1
The stacktrace starts
java.lang.ArrayIndexOutOfBoundsException: Index 8192 out of bounds for length 8192
at net.sf.saxon.str.StringTool.compress(StringTool.java:267) ~[Saxon-HE-11.4.jar:na]
at net.sf.saxon.str.CompressedWhitespace.compressWS(CompressedWhitespace.java:52) ~[Saxon-HE-11.4.jar:na]
at net.sf.saxon.str.StringTool.compress(StringTool.java:277) ~[Saxon-HE-11.4.jar:na]
at net.sf.saxon.pull.StaxBridge.getStringValue(StaxBridge.java:456) ~[Saxon-HE-11.4.jar:na]
The cause looks pretty clear from the stack trace: StringTool#277 is calling CompressedWhitespace.compressWS(in, offset, end);
but the method declaration is public static UnicodeString compressWS(char[] in, int start, int len)
so we're clearly passing an "end" offset where a length is expected.
Which obviously raises questions like (a) why is it only happening on this path, and (b) how can we write a regression test to catch the failure and make sure it's fixed.
(This confusion between end offset and length is a real elephant trap. Both conventions are widely used in Java APIs).
The reason the problem doesn't arise when a push parser is used is that on that path we only compress whitespace after assembling multiple text fragments in a buffer, and as a result it always calls compress()
with a start offset of 0, in which case end
and length
are interchangeable.
As for testing, we only run a rather small number of tests with a pull parser. I'll take a close look at some of those tests to see what path they are taking.
The tests we are running use Woodstox, which does indeed supply text in a 4000-character buffer -- but like the SAX parser, it's always returning 0 as the result of reader.getTextStart()
, which means we're again setting start=0 and therefore not seeing any distinction between end
and length
.
I think the best way of tackling this is going to be to write a unit test that invokes the compression methods directly, rather than trying to trigger an XML parser to set the right input conditions.
- Status changed from New to Resolved
- Fix Committed on Branch 11, 12, trunk added
- Platforms .NET added
Added unit tests in internaltests/CompressionTests
.
Bug fixed.
- % Done changed from 0 to 100
- Fixed in Maintenance Release 11.5 added
Bug fix applied in the Saxon 11.5 maintenance release.
- Status changed from Resolved to Closed
- Fixed in Maintenance Release 12.1 added
Bug fix applied in the Saxon 12.1 maintenance release.
Please register to edit this issue
Also available in: Atom
PDF