Project

Profile

Help

Bug #4527

Regex with grouping and escape triggers java.lang.IndexOutOfBoundsException

Added by Stefan Majewski over 1 year ago. Updated 11 months ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
XPath conformance
Sprint/Milestone:
-
Start date:
2020-04-24
Due date:
% Done:

100%

Estimated time:
Legacy ID:
Applies to branch:
10, 9.9
Fix Committed on Branch:
10, 9.9
Fixed in Maintenance Release:

Description

When working with saxon-he 10, we noticed that in some cases a regex used either in xsl:analyze-string or fn:matches causes saxon to exit with a stack-trace.

The circumstances are pretty hard to trigger.

  • the sequence/node/value that is matched against the regex needs to evaluate to an empty string.
  • the regex
    • two levels of grouping
    • the first level of grouping needs to cover the whole regex
    • the second level needs to be an alternative of patterns
    • the second level needs to be at the immediate beginning of the pattern
    • after the second grouping a pattern with two or more characters need to follow, one of which needs to be optional
    • the regex is not anchored at the start of the string with '^'
<xsl:stylesheet version="3.1"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                >
  <xsl:template match="/">
      <xsl:value-of select="matches('','((a|b)cd?)')"/>
  </xsl:template>
</xsl:stylesheet>

I know boiled down to this minimal example this looks hilarious. And sure, the outer grouping is not needed and with this, the error goes away. But still the regex as such should be valid and working. A colleague of mine ran into this while developing a rather complex transformation. As the regex appears to correct, and saxon has friendlier ways of indicating syntactically wrong regexes, I think this time it is a bug, usually it is just us as saxon is regarded the reference at our shop.

when run with saxon 10 he:

echo '<ok/>' | java -classpath /home/sm/git/saxon/share/java/saxon-he-10.0.jar net.sf.saxon.Transform -s:- analyze_function_bug.xsl
java.lang.IndexOutOfBoundsException
        at net.sf.saxon.regex.EmptyString.uCharAt(EmptyString.java:30)
        at net.sf.saxon.regex.Operation$OpAtom.iterateMatches(Operation.java:637)
        at net.sf.saxon.regex.REMatcher.checkPreconditions(REMatcher.java:494)
        at net.sf.saxon.regex.REMatcher.match(REMatcher.java:426)
        at net.sf.saxon.regex.ARegularExpression.containsMatch(ARegularExpression.java:90)
        at net.sf.saxon.functions.Matches.call(Matches.java:78)
        at net.sf.saxon.functions.Matches.call(Matches.java:24)
        at net.sf.saxon.expr.FunctionCall.iterate(FunctionCall.java:543)
        at net.sf.saxon.expr.AtomicSequenceConverter.iterate(AtomicSequenceConverter.java:297)
        at net.sf.saxon.expr.Expression.process(Expression.java:948)
        at net.sf.saxon.expr.instruct.ValueOf.processLeavingTail(ValueOf.java:332)
        at net.sf.saxon.expr.instruct.TemplateRule.applyLeavingTail(TemplateRule.java:374)
        at net.sf.saxon.trans.Mode.applyTemplates(Mode.java:555)
        at net.sf.saxon.trans.XsltController.applyTemplates(XsltController.java:669)
        at net.sf.saxon.s9api.AbstractXsltTransformer.applyTemplatesToSource(AbstractXsltTransformer.java:360)
        at net.sf.saxon.s9api.Xslt30Transformer.applyTemplates(Xslt30Transformer.java:285)
        at net.sf.saxon.Transform.processFile(Transform.java:1300)
        at net.sf.saxon.Transform.doTransform(Transform.java:840)
        at net.sf.saxon.Transform.main(Transform.java:82)
java.lang.RuntimeException: Internal error evaluating template rule  at line 4 in module file:/home/sm/git/bruckneruni/xslt/bug_function_analyze/analyze_function_bug.xsl
        at net.sf.saxon.expr.instruct.TemplateRule.applyLeavingTail(TemplateRule.java:393)
        at net.sf.saxon.trans.Mode.applyTemplates(Mode.java:555)
        at net.sf.saxon.trans.XsltController.applyTemplates(XsltController.java:669)
        at net.sf.saxon.s9api.AbstractXsltTransformer.applyTemplatesToSource(AbstractXsltTransformer.java:360)
        at net.sf.saxon.s9api.Xslt30Transformer.applyTemplates(Xslt30Transformer.java:285)
        at net.sf.saxon.Transform.processFile(Transform.java:1300)
        at net.sf.saxon.Transform.doTransform(Transform.java:840)
        at net.sf.saxon.Transform.main(Transform.java:82)
Caused by: java.lang.IndexOutOfBoundsException
        at net.sf.saxon.regex.EmptyString.uCharAt(EmptyString.java:30)
        at net.sf.saxon.regex.Operation$OpAtom.iterateMatches(Operation.java:637)
        at net.sf.saxon.regex.REMatcher.checkPreconditions(REMatcher.java:494)
        at net.sf.saxon.regex.REMatcher.match(REMatcher.java:426)
        at net.sf.saxon.regex.ARegularExpression.containsMatch(ARegularExpression.java:90)
        at net.sf.saxon.functions.Matches.call(Matches.java:78)
        at net.sf.saxon.functions.Matches.call(Matches.java:24)
        at net.sf.saxon.expr.FunctionCall.iterate(FunctionCall.java:543)
        at net.sf.saxon.expr.AtomicSequenceConverter.iterate(AtomicSequenceConverter.java:297)
        at net.sf.saxon.expr.Expression.process(Expression.java:948)
        at net.sf.saxon.expr.instruct.ValueOf.processLeavingTail(ValueOf.java:332)
        at net.sf.saxon.expr.instruct.TemplateRule.applyLeavingTail(TemplateRule.java:374)
        ... 7 more
Fatal error during transformation: java.lang.RuntimeException: Internal error evaluating template rule  at line 4 in module file:/home/sm/git/bruckneruni/xslt/bug_function_analyze/analyze_function_bug.xsl

History

#1 Updated by Stefan Majewski over 1 year ago

Forget the thing about the escape in the title, it was the optional character that was it. In our initial test it has been an optional escaped character.

#2 Updated by Michael Kay over 1 year ago

Thanks for reporting it. Reproduced as test regex-072 in the XSLT 3.0 test suite. Test case fails in both 9.9 and 10.0.

#3 Updated by Michael Kay over 1 year ago

What's happening here is that the regex compiler has extracted a precondition for a substring to match, which is that there must be a "c" at position 1 in the substring, and it is checking for a "c" at that position (relative to the start of the string) without first checking that the position is in range.

#4 Updated by Michael Kay over 1 year ago

Before checking the preconditions, there is a check that the substring isn't shorter than the minimum length of string that the regex can match. The minimum length isn't being computed (is always 0) for a capturing group. This ought to be safe, but we can do better. If we return a proper value for the minimum match length here, the crash doesn't happen because the preconditions are no longer checked. However, I'm not convinced this is a sufficient fix; there could be other circumstances that still trigger the problem.

#5 Updated by Michael Kay over 1 year ago

The other part of the problem seems to be that UnicodeString.isEnd(x) is specified to return true if x is >= the length of the string; but the implementation of this method for subclass EmptyString returns true only if x==0. Fixing this prevents the preconditions being checked beyond the end of an empty string.

(So there are several conditions for this bug to be triggered: the regex must include a capturing group, it must contain fixed-position preconditions, and the input string must be empty. Not surprising that the bug has been undetected for a long while...)

#6 Updated by Michael Kay over 1 year ago

  • Category changed from Internals to XPath conformance
  • Status changed from New to Resolved
  • Assignee set to Michael Kay
  • Fix Committed on Branch 10, 9.9 added

#7 Updated by O'Neil Delpratt over 1 year ago

  • % Done changed from 0 to 100
  • Fixed in Maintenance Release 10.1 added

Bug fix committed in the Saxon 10.1 maintenance release.

#8 Updated by O'Neil Delpratt 11 months ago

  • Status changed from Resolved to Closed
  • Fixed in Maintenance Release 10.2, 9.9.1.8 added
  • Fixed in Maintenance Release deleted (10.1)

Bug fix applied on the Saxon 9.9.1.8 maintenance release.

Please register to edit this issue

Also available in: Atom PDF