Project

Profile

Help

Bug #3211

closed

ArrayIndexOutOfBoundsException in ARegexIterator.computeNestingTable with complex regex

Added by Henry S Thompson over 7 years ago. Updated almost 5 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
XSLT conformance
Sprint/Milestone:
-
Start date:
2017-04-26
Due date:
% Done:

100%

Estimated time:
Legacy ID:
Applies to branch:
9.7
Fix Committed on Branch:
9.7, trunk
Fixed in Maintenance Release:
Platforms:

Description

Some combination of nesting and non-capturing groups is the cause. Test case attached -- just run it with empty input.

The 7th branch is the problem, simplifying it in a variety of ways makes the problem go away.

For testing purposes, you can pass in simplified patterns and/or formulae from the command line, so e.g. this, with one fewer non-capturing group, works:

saxon97 test.xsl pat='("[^"]*")|(\{[^}]+})|(,)|([^=\-+*/();:,.$<>^!]+(?:\.[^=\-+*/();:,.$<>^!]+)*\()|([)])|(^=|\()|((?:(?:'\''[^'\'']+'\'')))|(\$?[A-Z]+\$?[0-9]+)|([a-zA-Z_\\][a-zA-Z0-9._]*)|(.)'

Files

test.xsl (730 Bytes) test.xsl Test case Henry S Thompson, 2017-04-26 11:43
Actions #1

Updated by Michael Kay over 7 years ago

  • Category set to XSLT conformance
  • Status changed from New to In Progress
  • Assignee set to Michael Kay
  • Applies to branch 9.8 added

Thanks for reporting it.

It looks as if this code isn't handling non-capturing groups correctly: the start of such a group (that is (?) is properly detected, but the corresponding end of the group isn't.

For background, the nesting table is computed only in the case where there is a zero-length captured group. A comment in the code explains:

The problem here is that the information available from Java isn't sufficient to determine the nesting of groups: match("a", "(a(b?))") and match("a", "(a)(b?)") will both give the same result for group 2 (start=1, end=1). So we need to go back to the original regex to determine the group nesting.

Actions #2

Updated by Michael Kay over 7 years ago

  • Status changed from In Progress to Resolved
  • Fix Committed on Branch 9.7, 9.8 added

Patch committed. Test case analyze-string-097 added to W3C test suite.

Actions #3

Updated by Henry S Thompson over 7 years ago

Thanks for prompt action!

ht

Actions #4

Updated by Michael Kay over 7 years ago

Slight revision the patch: (a) there was a regression test analyze-string-017 which failed; (b) the same incorrect code was present for .NET in DotNetRegexIterator and the two classes have been changed to use a common method.

Actions #5

Updated by O'Neil Delpratt over 7 years ago

  • Fix Committed on Branch trunk added
  • Fix Committed on Branch deleted (9.8)
Actions #6

Updated by O'Neil Delpratt over 7 years ago

  • Applies to branch deleted (9.8)
Actions #7

Updated by O'Neil Delpratt over 7 years ago

  • Status changed from Resolved to Closed
  • % Done changed from 0 to 100
  • Fixed in Maintenance Release 9.7.0.19 added

Bug fix applied in the 9.7.0.19 maintenance release.

Actions #8

Updated by Michael Kay almost 5 years ago

  • Description updated (diff)
Actions #9

Updated by Michael Kay almost 5 years ago

See also bug #4407, which reveals that this regular expression is actually invalid: it contains an unescaped right curly brace.

Please register to edit this issue

Also available in: Atom PDF