Project

Profile

Help

Bug #3902

Regex non-greedy matching failure

Added by Michael Kay 3 months ago. Updated about 1 month ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
XSLT conformance
Sprint/Milestone:
-
Start date:
2018-09-12
Due date:
% Done:

100%

Legacy ID:
Applies to branch:
9.8, trunk
Fix Committed on Branch:
9.8, trunk
Fixed in Maintenance Release:

Description

See https://saxonica.plan.io/boards/3/topics/7300

Made into XSLT3 test case analyze-string-099

History

#1 Updated by Michael Kay 3 months ago

The path taken by the non-greedy code for OpRepeat at Operation#884 is quite bewildering. Can't see the logic to it at all. Will need to study how this works in some examples where we actually get the right answer.

Note also in passing, the optimization that recognizes that a match must start with a particular character and fast-forwards to that character (at REMatcher#439) doesn't seem to be working as well as it should - although it recognizes that there's a fixed prefix it still seems to look for a match at every character position.

#2 Updated by Michael Kay 3 months ago

Not many tests exercise this path. In fact, only the following six tests do so:

In XSLT3: regex-syntax-0861, -0955

In QT3: fn-matches-50, re00974, -5, -6

And all of these are concerned with giving a boolean match/no-match result, not in extracting the matching substring.

#3 Updated by Michael Kay 3 months ago

Concerning the easier problem of the poorly-performing prefix comparison, the issue is that when there is a mismatch on the last character of the prefix (which in this case is the only character) we're taking the path as if the prefix matched, which negates the optimization in the case where the prefix has length 1.

Fixed this on the 9.9 branch.

#4 Updated by Michael Kay 3 months ago

Bug #3787 is in the same general area. In analyzing that bug, although I found an empirical resolution, I also remarked that the logic behind the existing code was impenetrable. It doesn't appear that I added a new test reflecting bug #3787 and it might be worth doing so.

#5 Updated by Michael Kay 3 months ago

I have replaced the impenetrable code (for non-greedy OpRepeat with a variable length match) with a greatly simplified version. This is working for the test case in question, but it doesn't yet handle min/max cardinality.

With this simple implementation, one of the tests in fn-matches-50 (the Perl regex test suite) fails with infinite backtracking: the regex in question is matches('abcde', '(?:r?)*?r|(.{2,4})') (a classic one that allows an infinite number of zero-length matches).

Solved this the traditional way of adding a ForceProgressIterator.

With this change all tests are passing. So given that we haven't implemented min and max, we clearly need more tests!

#6 Updated by Michael Kay 3 months ago

  • Status changed from New to Resolved
  • Fix Committed on Branch 9.8, trunk added

I have added a number of tests to QT3 fn-matches to handle non-greedy matching with min/max constraints.

All XSLT3 / QT3 tests now passing.

#7 Updated by O'Neil Delpratt 3 months ago

  • % Done changed from 0 to 100
  • Fixed in Maintenance Release 9.9.0.1 added

Bug fix applied in the Saxon 9.9.0.1 major release.

Leave open until fix applied in the next Saxon 9.8 maintenance release.

#8 Updated by O'Neil Delpratt about 1 month ago

  • Status changed from Resolved to Closed
  • Fixed in Maintenance Release 9.8.0.15 added

Bug fix applied in the Saxon 9.8.0.15 maintenance release.

Please register to edit this issue

Also available in: Atom PDF