Project

Profile

Help

Bug #4743

closed

Performance: substring() and subsequence()

Added by Michael Kay about 4 years ago. Updated over 3 years ago.

Status:
Closed
Priority:
Low
Assignee:
Category:
-
Sprint/Milestone:
-
Start date:
2020-09-21
Due date:
% Done:

100%

Estimated time:
Applies to JS Branch:
2
Fix Committed on JS Branch:
2
Fixed in JS Release:
SEF Generated with:
Platforms:
Company:
-
Contact person:
-
Additional contact persons:
-

Description

substring() is implemented by creating an array of codepoints, and calling subsequence() on this array.

subsequence() is implemented using array.filter() - it looks at each item to see if it is within the specified range.

This is a rather inefficient way of selecting a small subsequence of a large string or array.

Actions #1

Updated by Michael Kay about 4 years ago

  • Status changed from New to In Progress
  • Applies to JS Branch Trunk added
  • Fix Committed on JS Branch Trunk added

I've done a partial fix by changing CoreFn.subsequence() to recognise common simple cases where Array.slice() can be used in place of Array.filter(). The more complex cases, e.g. where start is negative or length is +INF, are still handled using Array.filter().

A further improvement would be (for both substring() and subsequence()) to avoid creating the entire array beyond the selected end point: for example subsequence(//p, 1, 2) should not read //p in its entirety, and substring(para, 1, 1) should not build an array of codepoints containing the entire string.

Actions #2

Updated by Michael Kay about 4 years ago

I have reimplemented substring() and subsequence() to use an iterator that iterates over the supplied sequence (of characters or items), skips the first N, returns the next M, and then stops (so the rest of the input is not processed). Getting this right was surprisingly tricky because of the complex edge cases involving infinity and NaN.

Actions #3

Updated by Michael Kay about 4 years ago

  • Status changed from In Progress to Resolved
Actions #4

Updated by Debbie Lockett over 3 years ago

  • Applies to JS Branch 2 added
  • Applies to JS Branch deleted (Trunk)
  • Fix Committed on JS Branch 2 added
  • Fix Committed on JS Branch deleted (Trunk)
Actions #5

Updated by Debbie Lockett over 3 years ago

  • Status changed from Resolved to Closed
  • % Done changed from 0 to 100
  • Fixed in JS Release set to Saxon-JS 2.1

Bug fix applied in the Saxon-JS 2.1 maintenance release.

Please register to edit this issue

Also available in: Atom PDF Tracking page