Bug #6211: Saxon return wrong error position in XPath expression - Saxon - Saxonica Developer Community

Actions

Send by e-mail Copy link

Bug #6211

closed

Saxon return wrong error position in XPath expression

Added by Gerben Abbink over 1 year ago. Updated about 1 year ago.

Status:

Closed

Priority:

Normal

Assignee:

Michael Kay

Category:

Diagnostics

Sprint/Milestone:

Start date:

2023-10-02

Due date:

% Done:

100%

Estimated time:

Legacy ID:

Applies to branch:

12, trunk

Fix Committed on Branch:

12, trunk

Fixed in Maintenance Release:

12.4

Platforms:

.NET, Java

Description

I have found two XPath expressions with an error where Saxon returns a wrong error position (there's a small offset).

1 instance of array(xs:NMTOKENS) == Saxon reports 20, I think 21 is better.
[1, 2, 3] instance of function(xs:IDREFS) as xs:IDREFS == Saxon reports 31, I think 32 is better.

Actions

Copy link

Updated by Michael Kay over 1 year ago

This is all much more complicated than anyone could imagine.

Internally, the XPath/XQuery parser associates a zero-based character offset to each token, and the offset associated with xs:NMTOKENS is (correctly) 20. The parser also maintains data retaining the offsets of newlines, so the character offset 20 can be translated to a zero-based (line=0, column=20). The question is then how this should be presented to users.

Because XPath is often embedded in a host language such as XSLT, this raw information is paired with information about the location of the XPath expression within a containing document (that's why it makes sense to maintain zero-based offsets internally). In the general case it's complicated by the fact that (a) a SAX parser doesn't give us accurate location information for each attribute (only for the element), and (b) before we get to parse an XPath expression held in an XML attribute, the XML parser performs attribute value normalization. However, an editor such as Oxygen typically DOES have accurate location information for an attribute, and also has access to the unnormalized value, so it is able (with a lot of effort) to combine our zero-based location information with its own knowledge of the attribute position to do accurate redlining of the error.

That explains the complexity, but it doesn't explain why we are outputting zero-based line and column offsets in the simple case where a query is read directly from an input file or a string on the command line. I think it would be appropriate in that case to convert the values for human consumption to a 1-based offset, which is probably what most users would expect.

Actions

Copy link

Updated by Michael Kay about 1 year ago

Category set to Diagnostics
Status changed from New to Resolved
Assignee set to Michael Kay
Priority changed from Low to Normal
Applies to branch 12, trunk added
Fix Committed on Branch 12, trunk added
Platforms .NET, Java added

I fixed this by adjusting for the zero-based offsets in the StandardErrorReporter.

Unfortunately I got distracted by other things and failed to update the issue at the time; also the commit went in alongside other changes and the commit messsage references bug #6221.

Actions

Copy link