Project

Profile

Help

Bug #3833

Problems using fn:replace() with special characters in replacement string

Added by Debbie Lockett 9 months ago. Updated 6 months ago.

Status:
Closed
Priority:
Normal
Sprint/Milestone:
-
Start date:
2018-07-04
Due date:
% Done:

100%

Applies to JS Branch:
1.0, Trunk
Fix Committed on JS Branch:
1.0, Trunk
Fixed in JS Release:
SEF Generated with:
Company:
-
Contact person:
-
Additional contact persons:
-

Description

Bug discovered while working on Issue #3790.

Attempted to convert a supplied string to a regex string (e.g. suitable for use in xsl:analyze-string) by escaping characters explicitly:


fn:replace($search, "[\^\.\\\?\*\+\{\}\(\)\|\$\[\]]", "\\$0" )

However this did not work as expected. Apparently there is some problem handling special characters in the replacement string (the 3rd arg of fn:replace) in the Saxon-JS implementation.


Related issues

Related to Saxon-JS - Bug #3910: Error using fn:replace() with "q" flag when search string contains "-" character Closed 2018-09-19 Actions

History

#1 Updated by Debbie Lockett 9 months ago

  • Description updated (diff)

#2 Updated by Debbie Lockett 9 months ago

Internally, the implementation uses a function convertReplacement (in regex.js) to convert the replacement string (as supplied to the XPath fn:replace function) for use with the JavaScript String.replace() function. The special characters used in replacement strings are (unsurprisingly) different for the XPath and JS replace functions.

XPath special characters in fn:replace replacement string (without q flag) (https://www.w3.org/TR/xpath-functions-31/#func-replace, https://www.w3.org/TR/xpath-functions-31/#flags):

  • $N refers to the substring captured by the Nth parenthesized sub-expression in the regular expression

  • $0 refers to the substring captured by the regular expression as a whole

  • a literal $ character must be written as \$, and a literal \ character must be written as \\

JS special characters in String.replace() replacement string (https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/replace#Specifying_a_string_as_a_parameter):

  • $$ Inserts $

  • $& Inserts the matched substring.

  • $` Inserts the portion of the string that precedes the matched substring.

  • $' Inserts the portion of the string that follows the matched substring.

  • $n Inserts nth parenthesized submatch string

So the convertReplacement function should:

  • replace \\ by \

  • replace \$ by $$

  • replace $0 by $&

But must take care in the way that these are done, and the order they happen.

Additionally, the implementation does not look right for the case that the "q" flag is used. This includes the following code: str.replace(/\$/g, "$$") But this actually just says replace $ by $; when it should say replace $ by $$. We apparently handle the special cases for $' and $` separately, but that is all (e.g. not $&).

#3 Updated by Debbie Lockett 9 months ago

Tests added to XSLT 3.0 test suite in the regex test set (tests regex-070a to regex-070l): testing various combinations of special characters in replacement strings, with and without the q flag.

#4 Updated by Debbie Lockett 9 months ago

  • Status changed from New to Resolved
  • Fix Committed on JS Branch 1.0, Trunk added

For the "q" flag case, the conversion should just be replacing all $ by $$, i.e. str.replace(/\$/g, "$$$$").

In the other case, the new code splits up the conversion steps, and adds details to ensure conversions happen correctly (i.e. that the regex string is parsed correctly e.g. the string \\$0 contains the special characters \\ and $0, not the character \$).

Step 1: $n leave as is except replace $0 by $& str = str.replace(/(^|[^\\])(\\\\)*\$0/g, "$1$2$$\&");

Step 2: for $ not followed by digit, replace \$ by $$ str = str.replace(/(^|[^\\])(\\\\)*\\\$([^0-9]|[^&]|$)/g, "$1$2$$$$$3");

Step 3: replace \\ by \ str = str.replace(/\\\\/g, "\\");

Code changes committed on 1.x and trunk branches.

#5 Updated by Debbie Lockett 6 months ago

  • Related to Bug #3910: Error using fn:replace() with "q" flag when search string contains "-" character added

#6 Updated by Debbie Lockett 6 months ago

  • Status changed from Resolved to In Progress

Reopening because the patch is insufficient. As demonstrated by the failure of QT3 test fn-replace-45, which tests the following:

replace("Now, let's SEND OUT for QUICHE!!", "[A-Z][A-Z]+", "$0$0")

The problem here is that the replacement string has consecutive occurrences of $0, which are not handled correctly by step 1. The convertReplacement function returns $&$0 rather than $&$&.

Similarly, consecutive occurrences of \$ will not be handled correctly by step 2.

#7 Updated by Debbie Lockett 6 months ago

Test regex-071 added to regex test set in XSLT 3.0 test suite: to test consecutive $0 and \$ in replacement strings.

#8 Updated by Debbie Lockett 6 months ago

  • Status changed from In Progress to Resolved

Using the Javascript String.replace() function to convert the replacement string was becoming too involved and confusing. Instead, the code in convertReplacement has been changed to parse and convert the replacement string directly using a switch on characters in the string.

The regex check for invalid replacement strings /(^|[^\\])\$([^0-9]|$)/.test(str) || /(^|[^\\])\\([^\\\$]|$)/.test(str) is retained, to filter out invalid strings. This means that within the switch it can be assumed that the replacement string is valid. So certain conditions can be assumed - e.g. an unescaped dollar must be followed by a digit; and an unescaped backslash must be followed by a backslash or dollar.

Patch committed on Saxon-JS 1 and trunk branches.

#9 Updated by Debbie Lockett 6 months ago

  • Status changed from Resolved to Closed
  • % Done changed from 0 to 100
  • Fixed in JS Release set to Saxon-JS 1.2.0

Bug fix applied in the Saxon-JS 1.2.0 maintenance release.

Please register to edit this issue

Also available in: Atom PDF Tracking page