Bug #4743
closed
Performance: substring() and subsequence()
Fix Committed on JS Branch:
2
Description
substring() is implemented by creating an array of codepoints, and calling subsequence() on this array.
subsequence() is implemented using array.filter() - it looks at each item to see if it is within the specified range.
This is a rather inefficient way of selecting a small subsequence of a large string or array.
- Status changed from New to In Progress
- Applies to JS Branch Trunk added
- Fix Committed on JS Branch Trunk added
I've done a partial fix by changing CoreFn.subsequence()
to recognise common simple cases where Array.slice()
can be used in place of Array.filter(). The more complex cases, e.g. where start is negative or length is +INF, are still handled using Array.filter()
.
A further improvement would be (for both substring()
and subsequence()
) to avoid creating the entire array beyond the selected end point: for example subsequence(//p, 1, 2) should not read //p
in its entirety, and substring(para, 1, 1)
should not build an array of codepoints containing the entire string.
I have reimplemented substring() and subsequence() to use an iterator that iterates over the supplied sequence (of characters or items), skips the first N, returns the next M, and then stops (so the rest of the input is not processed). Getting this right was surprisingly tricky because of the complex edge cases involving infinity and NaN.
- Status changed from In Progress to Resolved
- Applies to JS Branch 2 added
- Applies to JS Branch deleted (
Trunk)
- Fix Committed on JS Branch 2 added
- Fix Committed on JS Branch deleted (
Trunk)
- Status changed from Resolved to Closed
- % Done changed from 0 to 100
- Fixed in JS Release set to Saxon-JS 2.1
Bug fix applied in the Saxon-JS 2.1 maintenance release.
Please register to edit this issue
Also available in: Atom
PDF
Tracking page