Project

Profile

Help

Bug #2445

closed

Incorrectly reported error column

Added by Radu Coravu over 8 years ago. Updated over 8 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
Diagnostics
Sprint/Milestone:
-
Start date:
2015-08-25
Due date:
% Done:

100%

Estimated time:
Legacy ID:
Applies to branch:
Fix Committed on Branch:
Fixed in Maintenance Release:
Platforms:

Description

For this invalid XSLT Stylesheet:

<xsl:stylesheet version="2.0"

xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 

xmlns:p="http://www.oxygenxml.com/ns/samples/personal">

<xsl:template match="//map//*[

    contains(@class,' bookmap/chapter ') |

    //map//*[contains(@class,' map/topicref ') 

    and count(ancestor::*[contains(@class,' map/topicref ')]]">

    
</xsl:template>

</xsl:stylesheet>

The transformer fails and reports a problem:

XTSE0340 XSLT Pattern syntax error at char 173 on line 8 near {...ins(@class,' map/topicref '...}: expected ")", found "]"

but there is no column 173, the xpath is broken on multiple lines.


Files

test.xsl (432 Bytes) test.xsl Radu Coravu, 2015-08-25 10:53
Actions #1

Updated by Radu Coravu over 8 years ago

I'm attaching the XSLT contents, I did not properly format my original post.

Actions #2

Updated by Michael Kay over 8 years ago

It's "char 173" rather than "column 173". It's the best we can do, because we don't have full information from the XML parser. The basic location information provided by SAX only gives us the line and column corresponding to the ">" at the end of the element's start tag; and the value that we see for attributes has been subjected to XML's attribute whitespace normalization. So the best we can do is report a character offset within the pattern.

Actions #3

Updated by Radu Coravu over 8 years ago

Thanks MK.

I understand, we have quite a lot of patches in Xerces in order to compute the precise location for each attribute name and value.

Unfortunately as XPath expressions increase in complexity, splitting them on multiple lines becomes quite common and locating precisely on what line the XPath is broken would have been valuable to the end user.

Actions #4

Updated by Michael Kay over 8 years ago

Yes, it would be great to have more precise line/column information, but without a customized XML parser I can't see how to achieve that.

Perhaps if oXygen has better information, we could define some kind of interface that allows the information to be made available to Saxon?

Or we could try and structure the location information in an XPathException for XPath static errors more precisely, so that it includes (a) the line/column of the containing element, as defined by the SAX SourceLocator, (b) the attribute name, and (c) the line/column relative to the attribute value as supplied to Saxon - this would only be useful if XML attribute value normalization is suppressed.

Actions #5

Updated by Radu Coravu over 8 years ago

Second choice looks quite interesting. So you could have a custom XPathException implementation which besides line and column reporting (which would remain unchanged) would report the attribute name and a relative offset in the attribute value where the problem is encountered.

Indeed the main problem remains the fact that Xerces will normalize attribute values when the SAX callback is received.

The XNI level in Xerces still has the non-normalized value: org.apache.xerces.xni.XMLAttributes.getNonNormalizedValue(int).

When the SAX content handler receives the startElement() callback the "Attributes" interface you receive is actually the concrete object "org.apache.xerces.parsers.AbstractSAXParser.AttributesProxy" which contains inside it the "protected XMLAttributes fAttributes;" field.

So possibly you could use Java reflection to get this field and ask it for the unaltered attribute value. And if it fails, fallback to the normalized value. But I'm not sure that you want to tie Saxon to a specific parser implementation.

Actions #6

Updated by Michael Kay over 8 years ago

OK, I've got the denormalization part working. If the stylesheet parser is an instance of Apache Xerces, I stick an adapter into the pipeline which modifies the Attributes object to substitute the pre-normalization value for the normalized value. The adapter uses reflection as you suggested.

Actions #7

Updated by Radu Coravu over 8 years ago

And we could do the same thing when validating XSLT stylesheets, provide a source for the XSL with a custom XMLReader which actually provides non-normalized values. As long as Saxon does not rely on certain attribute values being already normalized by the Xerces processor. Also this would only apply to the parser used for the XSLT stylesheet, the parser for the XML would behave as it previously did.

Actions #8

Updated by Radu Coravu over 8 years ago

There might be a problem with entity references. The not normalized value might have the entity references not expanded which would be a problem.

Actions #9

Updated by Michael Kay over 8 years ago

  • Category set to Diagnostics
  • Assignee set to Michael Kay
  • Priority changed from Low to Normal

Thanks for the advice about entity references. I've added to my adapter the ability to expand built-in entity references and character references.

But there are still other problems.

  • Expanding character references will mean the line/column positions computed by the XPath parser are still wrong.

  • There is also the edge case of newlines and tabs occurring within XPath string literals. If we fail to normalize these to spaces, we get a different behaviour (it's probably a behaviour closer to what the user might have intended, but it's still different). I don't think it's non-conformant (the XSLT spec says you can construct a stylesheet tree any way you like), and I don't think it will break many stylesheets, but I wouldn't want to do it without a configuration switch.

More thought needed.

Actions #10

Updated by Radu Coravu over 8 years ago

Also there may be very rare cases in which the XPath expression has custom entity references other than the simple amp, lt, gt and character reference ones (maybe somebody added a DOCTYPE to the XLST stylesheet defining them). So in cases in which the non-normalized values contains unknown entity references, there would be need to fallback to the normalized value.

That remark about newlines and tabs in xpath literals is quite interesting. We would need to know/test exactly how Xerces normalizes spaces in attribute values. For example if the value has two consecutive simple spaces (%20), does it report only one after the normalization? Or does it just convert tabs and new lines to simple spaces? Probably Xerces also compacts consecutive spaces. If it would not do that, there would be no problem with the reported location relative to the beginning of the attribute value.

Actions #11

Updated by Michael Kay over 8 years ago

For a stylesheet I think we can usually assume that there is no DTD, or if there is one, it does not contain element/attribute declarations. Therefore the attributes will be CDATA attributes, which means that attribute-value normalization will convert newline and tab characters to x20, but will not collapse multiple space characters into single space characters.

Actions #12

Updated by Radu Coravu over 8 years ago

Then the offset relative to the start of the attribute value should be quite accurate.

Actions #13

Updated by Michael Kay over 8 years ago

Test boolean-082 fails with these changes. It contains the LRE

           @<td style="color: #336699; font-weight:

bold">HCFA Jcodes:

           <td>@

(lines wrapped as shown, almost certainly unintentionally)

and the effect of the change is that the generated HTML replicates the newline instead of substituting a space, causing a failure to be reported by the test suite.

Actions #14

Updated by Michael Kay over 8 years ago

  • Status changed from New to In Progress
  • Found in version set to 9.6

An update on this: I've been doing a lot of work in this area over the last week. It was needed anyway because the introduction of separately-compiled packages in 9.7 requires changes in the way we handle location information, especially for dynamic errors.

I've decided against using the pre-normalization attribute value. Although we've shown it's possible, I'm concerned about edge cases where we produce different results or non-conformant results depending on the XML parser used.

What I am trying to do is to provide finer-grained error information. The plan is that the exception passed to the ErrorListener will contain a structured Location object that contains both "container" information about the element and attribute in which the XPath expression appears (element name, attribute name, line and column number as reported by SAX) and where appropriate fine-grained information about the position within the XPath expression: typically the offset within the normalized attribute value. I'm hoping that a debugger with access to the pre-normalized value should be able to convert this to a position in the source by simulating the process of normalization and counting characters until it reaches the supplied offset. Actually, the counting is quite simple: every character and every built-in or numeric character reference in the pre-normalized value represents one character in the post-normalized value.

These changes will all be for 9.7, of course, they are much too disruptive to implement as a 9.6 patch.

Actions #15

Updated by Radu Coravu over 8 years ago

Hi Michael,

I think you chose the right approach. After you release 9.7 we'll try to see if we can benefit of these changes in Oxygen XML Editor for showing more precise validation errors.

Regards,

Radu

Actions #16

Updated by Michael Kay over 8 years ago

  • Status changed from In Progress to Resolved
  • Fixed in version set to 9.7

Closing this as fixed in 9.7. In 9.7 the Location object that we supply with error information can contain additional information about the location of the error; in particular for an error occurring in an XPath expression within an attribute of an element in a stylesheet we supply (a) the line/column of the element, (b) the name of the attribute, and (c) the position within the (normalized) attribute value. We think that supplying the attribute name should be a big step forward for IDEs in positioning the cursor correctly on the error; supplying the offset within the normalized attribute should usually enable correct position of the cursor within the attribute with the help of a bit of character counting to reproduce the effect of normalization.

Actions #17

Updated by Radu Coravu over 8 years ago

Thanks Michael.

Actions #18

Updated by O'Neil Delpratt over 8 years ago

  • Status changed from Resolved to Closed
  • % Done changed from 0 to 100
  • Fixed in version changed from 9.7 to 9.6.0.8
Actions #19

Updated by O'Neil Delpratt over 8 years ago

  • Fixed in version deleted (9.6.0.8)

This bug fix went out in Saxon 9.7 but not in 9.6

Please register to edit this issue

Also available in: Atom PDF