Bug #3387: Inverse character ranges in regular expressions - SaxonJS - Saxonica Developer Community

Actions

Send by e-mail Copy link

Bug #3387

closed

Inverse character ranges in regular expressions

Added by Michael Kay over 7 years ago. Updated over 4 years ago.

Status:

Closed

Priority:

Normal

Assignee:

Michael Kay

Category:

XPath Conformance

Sprint/Milestone:

Start date:

2017-08-11

Due date:

% Done:

100%

Estimated time:

Applies to JS Branch:

Fix Committed on JS Branch:

Trunk

Fixed in JS Release:

Saxon-JS 2.0

SEF Generated with:

Platforms:

Company:

Contact person:

Additional contact persons:

Description

I think the logic for handling inverse character ranges such as \P{L} may be incorrect.

For two-letter categories the logic is sound. The categories.json file gives a definition of Ll as

[["61","7A"],["B5","B5"],["DF","F6"],...

and to invert this we form the ranges corresponding to the gaps:

["0","60"],["7B","B4"],["B6","DE"],...

This only works if the ranges are in ascending order. For single-character categories such as L, we concatenate the subcategories, and the result is therefore the union of the gaps in the subcategories, when it should be the gaps in the union of the subcategories.

Oddly, I can't point my finger at tests that are failing as a result.

Please register to edit this issue

Actions

Send by e-mail Copy link

Also available in: Atom PDF Tracking page

Project

Profile

Help

Saxon » SaxonJS

Planio Inbox

Bug #3387

Inverse character ranges in regular expressions

Updated by Michael Kay over 7 years ago

Updated by Michael Kay about 7 years ago

Updated by Debbie Lockett over 4 years ago

Updated by Debbie Lockett over 4 years ago