Project

Profile

Help

Bug #5838

closed

ArrayIndexOutOfBounds with whitespace coming from StaxSource

Added by Michael Kay about 1 year ago. Updated about 1 year ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
Internals
Sprint/Milestone:
-
Start date:
2023-01-19
Due date:
% Done:

100%

Estimated time:
Legacy ID:
Applies to branch:
11, 12, trunk
Fix Committed on Branch:
11, 12, trunk
Fixed in Maintenance Release:
Platforms:
.NET, Java

Description

See https://saxonica.plan.io/boards/3/topics/9228?pn=1

The stacktrace starts

java.lang.ArrayIndexOutOfBoundsException: Index 8192 out of bounds for length 8192
    at net.sf.saxon.str.StringTool.compress(StringTool.java:267) ~[Saxon-HE-11.4.jar:na]
    at net.sf.saxon.str.CompressedWhitespace.compressWS(CompressedWhitespace.java:52) ~[Saxon-HE-11.4.jar:na]
    at net.sf.saxon.str.StringTool.compress(StringTool.java:277) ~[Saxon-HE-11.4.jar:na]
    at net.sf.saxon.pull.StaxBridge.getStringValue(StaxBridge.java:456) ~[Saxon-HE-11.4.jar:na]
Actions #1

Updated by Michael Kay about 1 year ago

The cause looks pretty clear from the stack trace: StringTool#277 is calling CompressedWhitespace.compressWS(in, offset, end); but the method declaration is public static UnicodeString compressWS(char[] in, int start, int len) so we're clearly passing an "end" offset where a length is expected.

Which obviously raises questions like (a) why is it only happening on this path, and (b) how can we write a regression test to catch the failure and make sure it's fixed.

(This confusion between end offset and length is a real elephant trap. Both conventions are widely used in Java APIs).

Actions #2

Updated by Michael Kay about 1 year ago

The reason the problem doesn't arise when a push parser is used is that on that path we only compress whitespace after assembling multiple text fragments in a buffer, and as a result it always calls compress() with a start offset of 0, in which case end and length are interchangeable.

As for testing, we only run a rather small number of tests with a pull parser. I'll take a close look at some of those tests to see what path they are taking.

Actions #3

Updated by Michael Kay about 1 year ago

The tests we are running use Woodstox, which does indeed supply text in a 4000-character buffer -- but like the SAX parser, it's always returning 0 as the result of reader.getTextStart(), which means we're again setting start=0 and therefore not seeing any distinction between end and length.

I think the best way of tackling this is going to be to write a unit test that invokes the compression methods directly, rather than trying to trigger an XML parser to set the right input conditions.

Actions #4

Updated by Michael Kay about 1 year ago

  • Status changed from New to Resolved
  • Fix Committed on Branch 11, 12, trunk added
  • Platforms .NET added

Added unit tests in internaltests/CompressionTests.

Bug fixed.

Actions #5

Updated by O'Neil Delpratt about 1 year ago

  • % Done changed from 0 to 100
  • Fixed in Maintenance Release 11.5 added

Bug fix applied in the Saxon 11.5 maintenance release.

Actions #6

Updated by O'Neil Delpratt about 1 year ago

  • Status changed from Resolved to Closed
  • Fixed in Maintenance Release 12.1 added

Bug fix applied in the Saxon 12.1 maintenance release.

Please register to edit this issue

Also available in: Atom PDF