Bug #3211
closed
ArrayIndexOutOfBoundsException in ARegexIterator.computeNestingTable with complex regex
Category:
XSLT conformance
Fix Committed on Branch:
9.7, trunk
Fixed in Maintenance Release:
Description
Some combination of nesting and non-capturing groups is the cause. Test case attached -- just run it with empty input.
The 7th branch is the problem, simplifying it in a variety of ways makes the problem go away.
For testing purposes, you can pass in simplified patterns and/or formulae from the command line, so e.g. this, with one fewer non-capturing group, works:
saxon97 test.xsl pat='("[^"]*")|(\{[^}]+})|(,)|([^=\-+*/();:,.$<>^!]+(?:\.[^=\-+*/();:,.$<>^!]+)*\()|([)])|(^=|\()|((?:(?:'\''[^'\'']+'\'')))|(\$?[A-Z]+\$?[0-9]+)|([a-zA-Z_\\][a-zA-Z0-9._]*)|(.)'
Files
- Category set to XSLT conformance
- Status changed from New to In Progress
- Assignee set to Michael Kay
- Applies to branch 9.8 added
Thanks for reporting it.
It looks as if this code isn't handling non-capturing groups correctly: the start of such a group (that is (?
) is properly detected, but the corresponding end of the group isn't.
For background, the nesting table is computed only in the case where there is a zero-length captured group. A comment in the code explains:
The problem here is that the information available from Java isn't sufficient to determine the nesting of groups: match("a", "(a(b?))")
and match("a", "(a)(b?)")
will both give the same result for group 2 (start=1, end=1). So we need to go back to the original regex to determine the group nesting.
- Status changed from In Progress to Resolved
- Fix Committed on Branch 9.7, 9.8 added
Patch committed. Test case analyze-string-097 added to W3C test suite.
Thanks for prompt action!
ht
Slight revision the patch: (a) there was a regression test analyze-string-017 which failed; (b) the same incorrect code was present for .NET in DotNetRegexIterator and the two classes have been changed to use a common method.
- Fix Committed on Branch trunk added
- Fix Committed on Branch deleted (
9.8)
- Applies to branch deleted (
9.8)
- Status changed from Resolved to Closed
- % Done changed from 0 to 100
- Fixed in Maintenance Release 9.7.0.19 added
Bug fix applied in the 9.7.0.19 maintenance release.
- Description updated (diff)
See also bug #4407, which reveals that this regular expression is actually invalid: it contains an unescaped right curly brace.
Please register to edit this issue
Also available in: Atom
PDF