XPath parsing edge cases
The WG believes that the XPath grammar allows expressions such as
12!(12 div.) 12 div-3 3!(12 div-.)
which Saxon cannot parse. The key point is that Saxon is reading the "." or "-" as a continuation of the token starting "div", which it should not do, because the tokenization rule is to accept the longest token consistent with the grammar, and in these contexts, an NCName containing a "." or "-" is not consistent with the grammar.
#1 Updated by Michael Kay almost 4 years ago
- Status changed from New to Resolved
- Applies to branch 9.7, 9.8 added
- Fix Committed on Branch 9.7, 9.8 added
I've tweaked the tokenizer to handle these cases, though the solution might not be completely general. Essentially the fix is as follows:
if we encounter "." or "-" when reading (what appears to be) a name, then we check whether the following conditions are true:
(a) the content of the token so far is a valid operator/keyword
(b) the preceding token is not classified as an operator symbol (token number > LAST_OPERATOR) and is not "?" or "*:"
(c) the preceding token is not a name that is equal to a valid operator/keyword
if all these conditions are true then we stop the tokenizing at the "." or "-" and return the operator token.
Condition (c) was found to be necessary because when we are parsing a FLWOR expression, keywords such as "in" and "return" are returned as NAME tokens rather than as specific operator tokens.
I'm committing a patch on the 9.7 and 9.8 branches.
Please register to edit this issue