Project

Profile

Help

Bug #2951

closed

Incorrect results from analyze-string()

Added by Michael Kay over 7 years ago. Updated almost 7 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
XPath conformance
Sprint/Milestone:
-
Start date:
2016-09-21
Due date:
% Done:

100%

Estimated time:
Legacy ID:
Applies to branch:
9.7
Fix Committed on Branch:
9.7, trunk
Fixed in Maintenance Release:
Platforms:

Description

Reported by Gerrit Imsieke on Saxon help list.

Test case analyze-string-096 added to XSLT 3.0 test suite.

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema" version="2.0">

  <xsl:variable name="regex" as="xs:string"
    select="'^(\p{Lu}{2}_\p{Lu}{3}_)?((\d{5})_)?(\d{5}).*$'" />
  <xsl:variable name="basename" as="xs:string"
    select="'UV_STD_00000_39000_Test'" />

  <xsl:template name="main">
    <xsl:analyze-string select="$basename" regex="{$regex}">
      <xsl:matching-substring>
        <xsl:message
          select="for $i in (1 to 4)
                  return concat($i, ':', regex-group($i))"/>
      </xsl:matching-substring>
    </xsl:analyze-string>
  </xsl:template>
</xsl:stylesheet>

Results are correct up to 9.6.0.7 and incorrect from 9.6.0.8.

Actions #1

Updated by Michael Kay over 7 years ago

  • Project changed from SaxonJS to Saxon
  • Category set to XPath conformance
  • Applies to branch 9.6, 9.7, 9.8 added
Actions #2

Updated by Michael Kay over 7 years ago

  • Description updated (diff)
Actions #3

Updated by Michael Kay over 7 years ago

The change that introduced this problem is the fix for bug #2487.

Bug #2518 is also relevant.

Both are concerned with clearing the values of regex groups when the regex engine backtracks. On this occasion the groups are initially set up correctly, and then the method clearCapturedGroupsBeyond(0) is called (after processing groups 2 and 3 but before processing group 4), which clears the value of groups 1, 2, and 3 to an empty string.

Actions #4

Updated by Michael Kay over 7 years ago

The patches for bug #2487 and #2518 both introduced calls to clearCapturedGroupsBeyond(p) into OpSequence.advance(). Removing the call introduced by #2487 appears to work; the call introduced by #2518 appears to be sufficient and correct: the original test case for #2487 (analyze-string-905) works with that patch removed, and this appears to cause no regression to other test cases.

So I have an empirical solution, but I'm going to try and study it a bit more carefully because at the moment I don't fully understand why it works.

Actions #5

Updated by Michael Kay over 7 years ago

  • Status changed from New to Resolved
  • Fix Committed on Branch 9.7, 9.8 added

The call on matcher.clearCapturedGroupsBeyond(position) at Operation#328 which appears to be causing the problem should only be called when a sequence match fails. In this case we have a sequence of 4 operations

\p{Lu}{2}     _     \p{Lu}{3}     _

and it's not obvious why that should fail; but it does. It's contained within the construct (...)?, which maps to an OpGreedyFixed operation, and it seems that the OpGreedyFixed code is attempting to match the subexpression twice; it counts the number of successful matches (1), establishes that this is within the range (0,1) , and reports success. But meanwhile the unsuccessful second match has caused the captured groups to be reset.

Changed OpGreedyFixed to avoid reading ahead beyond the max number of occurrences. Confirmed that with this change, all regex tests pass; moreover they pass whether or not the call to matcher.clearCapturedGroupsBeyond(position) at Operation#328 is present. So I will conclude that the call is redundant, and remove it, while also applying the fix to OpGreedyFixed.

(OpGreedyFixed is used for any greedy repeat operator of the form F{min,max} where {min,max} is the occurrence quantifier (here {0,1} represented as ?) and F matches a fixed number of characters in the input.)

Actions #6

Updated by O'Neil Delpratt over 7 years ago

  • Status changed from Resolved to Closed
  • % Done changed from 0 to 100
  • Fixed in Maintenance Release 9.7.0.10 added

Bug fix applied in the Saxon 9.7.0.10 maintenance release

Actions #7

Updated by O'Neil Delpratt almost 7 years ago

  • Applies to branch deleted (9.6, 9.8)
  • Fix Committed on Branch trunk added
  • Fix Committed on Branch deleted (9.8)

Please register to edit this issue

Also available in: Atom PDF