Project

Profile

Help

Bug #6211

closed

Saxon return wrong error position in XPath expression

Added by Gerben Abbink 7 months ago. Updated 5 months ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
Diagnostics
Sprint/Milestone:
-
Start date:
2023-10-02
Due date:
% Done:

100%

Estimated time:
Legacy ID:
Applies to branch:
12, trunk
Fix Committed on Branch:
12, trunk
Fixed in Maintenance Release:
Platforms:
.NET, Java

Description

I have found two XPath expressions with an error where Saxon returns a wrong error position (there's a small offset).

1 instance of array(xs:NMTOKENS) == Saxon reports 20, I think 21 is better.
[1, 2, 3] instance of function(xs:IDREFS) as xs:IDREFS == Saxon reports 31, I think 32 is better.

Actions #2

Updated by Michael Kay 7 months ago

This is all much more complicated than anyone could imagine.

Internally, the XPath/XQuery parser associates a zero-based character offset to each token, and the offset associated with xs:NMTOKENS is (correctly) 20. The parser also maintains data retaining the offsets of newlines, so the character offset 20 can be translated to a zero-based (line=0, column=20). The question is then how this should be presented to users.

Because XPath is often embedded in a host language such as XSLT, this raw information is paired with information about the location of the XPath expression within a containing document (that's why it makes sense to maintain zero-based offsets internally). In the general case it's complicated by the fact that (a) a SAX parser doesn't give us accurate location information for each attribute (only for the element), and (b) before we get to parse an XPath expression held in an XML attribute, the XML parser performs attribute value normalization. However, an editor such as Oxygen typically DOES have accurate location information for an attribute, and also has access to the unnormalized value, so it is able (with a lot of effort) to combine our zero-based location information with its own knowledge of the attribute position to do accurate redlining of the error.

That explains the complexity, but it doesn't explain why we are outputting zero-based line and column offsets in the simple case where a query is read directly from an input file or a string on the command line. I think it would be appropriate in that case to convert the values for human consumption to a 1-based offset, which is probably what most users would expect.

Actions #3

Updated by Michael Kay 6 months ago

  • Category set to Diagnostics
  • Status changed from New to Resolved
  • Assignee set to Michael Kay
  • Priority changed from Low to Normal
  • Applies to branch 12, trunk added
  • Fix Committed on Branch 12, trunk added
  • Platforms .NET, Java added

I fixed this by adjusting for the zero-based offsets in the StandardErrorReporter.

Unfortunately I got distracted by other things and failed to update the issue at the time; also the commit went in alongside other changes and the commit messsage references bug #6221.

Actions #4

Updated by O'Neil Delpratt 5 months ago

  • Status changed from Resolved to Closed
  • % Done changed from 0 to 100
  • Fixed in Maintenance Release 12.4 added

Bug fix applied in the Saxon 12.4 maintenance release

Please register to edit this issue

Also available in: Atom PDF