Bug #1852
closedArrayIndexOutOfBounds doing Unicode normalization
100%
Description
Reported on saxon-help list by High Cayless:
Hi, I'm running into a sporadic exception when converting NFC strings to NFD, using Saxon 9.5.1.1 HE:
java.lang.ArrayIndexOutOfBoundsException
at java.lang.System.arraycopy(Native Method)
at net.sf.saxon.tree.util.FastStringBuffer.insertWideChar(FastStringBuffer.java:407)
at net.sf.saxon.serialize.codenorm.Normalizer.internalDecompose(Normalizer.java:152)
at net.sf.saxon.serialize.codenorm.Normalizer.normalize(Normalizer.java:83)
at net.sf.saxon.functions.NormalizeUnicode.normalize(NormalizeUnicode.java:99)
at net.sf.saxon.functions.NormalizeUnicode.evaluateItem(NormalizeUnicode.java:35)
at net.sf.saxon.functions.NormalizeUnicode.evaluateItem(NormalizeUnicode.java:23)
I've played around a bit with varying strings, and it seems only to trigger in certain circumstances—maybe certain strings are sneaking by net.sf.saxon.tree.util.FastStringBuffer's ensureCapacity method?
Attached is a string that will consistently trigger the bug when passed to net.sf.saxon.serialize.codenorm.Normalizer's normalize method. Normalizer is set up to convert to NFD.
This seems like a regression, as I've been using Saxon on these files for a few years now, and I've only seen it since upgrading.
Thanks,
Hugh
The string in question is:
ἱερου ιβ´ ἔτουσ λγ Παῦνι κα μεμέτρηκεν εἰσ τὸν ἐν Διὸσ πόλει τῆι μεγάληι θησαυρὸν εἰσ τὴν ἐπιγραφὴν τοῦ τρίτου καὶ λ ἔτουσ ὑπὲρ τοῦ τόπου Σῶσασ Ἀλεξάνδρου κριθῆσ πέντε
Please register to edit this issue