Bug #3756
closedStack overflow in base64_decode()
100%
Description
Issue raised by user 2018-02-05 via email: stack overflow inside base64_decode() function, when a large data file is supplied.
One step of the work around produced for Bug #3163 uses the internal SaxonJS.U.Atomic.base64Binary.fromString() function to decode a base-64 encoded version of an SEF. SaxonJS.U.Atomic.base64Binary.fromString() calls base64_decode(). The function is used to decode a string of data which has been encoded using base-64 encoding. The output data is represented as a string containing octets/codepoints in the range 0-255.
The user suggested using the native JavaScript base64 decode function atob() rather than our custom version base64_decode(). But the custom version is used because there is an edge-case XPath conformance issue with using atob(), because it is more forgiving of invalid input than the XPath/XSD specification. (Also issues were caused by the non-ASCII character used for the SEF checksum.)
It is suspected that the stack overflow is caused by the large regex at the start of the base64_decode() function.
Updated by Debbie Lockett over 6 years ago
I have so far failed to reproduce the reported stack overflow. I have tested the huge-sef.json file supplied by the user (using the app discussed in Bug #3163). I have also run direct tests, which call xs:base64Binary() on large base-64 encoded files (1MB).
The suggested issue was the large regex:
if (!/^((([A-Za-z0-9+/]){4})*(([A-Za-z0-9+/]){3}[A-Za-z0-9+/]|([A-Za-z0-9+/]){2}[AEIMQUYcgkosw048]=|[A-Za-z0-9+/][AQgw]==))?$/.test(data)) {
invalidValue(data, "base64Binary");
}
It still seems worthwhile to split this up, and produce better failure messages for invalid values, as Mike suggested in the original email discussion:
(a) the length must be a multiple of 4.
(b) the string as a whole must match ^[A-Za-z0-9+/]*=?=?$
(c) The last four characters must match (with spaces for clarity)
@ [^=]* | .[AQgw]== | ..[AEIMQUYcgkosw048]= @
Updated by Debbie Lockett over 6 years ago
Code changes to split up regex in base64_decode() function committed on 1.x and 2.0 branches.
Updated by Debbie Lockett over 6 years ago
- Status changed from New to Resolved
- Fix Committed on JS Branch 1.0, Trunk added
Marking this as resolved. Though we have not reproduced the original stack overflow, hopefully the code changes have fixed the problem. The bug can always be reopened later if not.
Updated by Debbie Lockett over 6 years ago
- Status changed from Resolved to Closed
- % Done changed from 0 to 100
- Fixed in JS Release set to Saxon-JS 1.1.0
Bug fix applied in the Saxon-JS 1.1.0 maintenance release.
Please register to edit this issue
Also available in: Atom PDF Tracking page