Bug #2518
closedRegex: back references matching empty captures
100%
Description
See QT3 test bug https://www.w3.org/Bugs/Public/show_bug.cgi?id=29253
The WG agreed that these two tests had incorrect expected results. This means that under the new interpretation of the test results, Saxon is getting it wrong.
Looking at the first test case:
matches('babadad', '^((.)?a\2)+$')
which now expects true, what happens is
-
The first time round, "bab" matches fine
-
The second time around, (.) matches "a" and sets regex-group(2) to "a". The match of "d" against "a" fails, and the matching backtracks. But at this point the captured group is not reset. The "a" successfully matches against "a", and the back-reference \2 is then tested against the current value of the capture, which is still "a". This match fails, so the regex as a whole fails.
We're in good company here because it seems Perl and PCRE also return false for this one. But perhaps our spec is different: it's pretty clear that \2 should compare the current position against the empty string and therefore succeed.
I think there are actually two issues. Firstly, clearCapturedGroupsBeyond(n) isn't being called early enough - I think it's only trying to get the final contents of regex-group(x) right, not the values used to match against backreferences. Secondly, the method clearCapturedGroupsBeyond(n) is resetting the value of endn[i] but not the value of endBackref[i], which is what the backreference matcher actually uses.
Please register to edit this issue