Project

Profile

Help

Bug #3543

closed

XdmNode.getColumnNumber() is always -1 for text and comment nodes.

Added by Gerben Abbink over 6 years ago. Updated about 6 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
Diagnostics
Sprint/Milestone:
-
Start date:
2017-11-23
Due date:
% Done:

100%

Estimated time:
Legacy ID:
Applies to branch:
9.8, trunk
Fix Committed on Branch:
9.8, trunk
Fixed in Maintenance Release:
Platforms:

Description

I'm building an XML dom using S9API's DocumentBuilder. Configuration.setLineNumbering is set to true.

The resulting dom contains all kinds of nodes. Calling XdmNode.getColumnNumber() on elements, attributes and processing-instructions return an actual value. However, calling in on text and comments always returns -1.

I did some digging in Saxon's source code and in TinyBuilder.java i see that in method processingInstruction() tt.setLineNumber() is called, while in method comment() it is not called. Is that the reason why getColumnNumber() always returns -1 for comment and not for processing instructions?


Related issues

Has duplicate Saxon - Feature #1739: Add line/column information for nodes other than elements.ResolvedMichael Kay2013-04-23

Actions
Actions #1

Updated by Michael Kay over 6 years ago

I guess when you say you are building a "dom" you are using the term generically and what you are actually building is a TinyTree.

I think the history here is probably that we only maintained location information on elements and processing instructions because the InfoSet only maintains Base URI for those kinds of nodes.

The line and column information does appear to be available for all kinds of nodes (well, not attributes of course). For text nodes there is the complication of buffering: by the time we know that we have seen all the pieces of the text node, the Locator has been updated to the following (usually) startElement or endElement event. That means we have to make a copy of the location information whenever characters() is called. That's a bit of an overhead - and of course our main reason for keeping location information is for diagnostics on stylesheet and schema errors, where we only use element location anyway.

Actions #2

Updated by Michael Kay over 6 years ago

  • Category set to Diagnostics
  • Status changed from New to In Progress
  • Assignee set to Michael Kay
  • Priority changed from Low to Normal

On the development branch I have made this work so that line and column information is retained for elements, comments, PIs, and text nodes in the TinyTree, and I have added a unit test to this effect.

The changes to make it work for text nodes feel a bit risky to include in the 9.8 branch. It involves changing ReceivingContentHandler to save the location of the most recent text node; for performance reasons we should do that only if line numbering is enabled. To make RCH aware that line numbering is enabled it has to look in the ParseOptions of the PipelineConfiguration; but for some reason Configuration.buildDocument() is not setting the parseOptions in the PipelineConfiguration. I have changed it so that it now does so, but this is the kind of change that could easily have unexpected side-effects on other paths. So I'm inclined to leave location information on text nodes to the next release.

Actions #3

Updated by Michael Kay over 6 years ago

For the Linked tree, the data structure used makes it difficult to retain line/column information for any nodes other than elements, so I propose to make no change in this case.

Later, on further reflection: for the Linked tree, the Processing Instruction node currently has dedicated slots for system ID and line number, and it makes sense to add a slot for column number as well. And then it makes sense to add the same three slots for Comment nodes. Text nodes however are much more space-critical, and we can't afford to allocate fixed slots here in cases where they aren't used. So for text nodes, we will continue to return the location information of the parent element (actually, in XSLT stylesheet trees, I think it tends to use the preceding-sibling element if there is one, but that's done above the level of the tree model).

Actions #4

Updated by Michael Kay over 6 years ago

  • Has duplicate Feature #1739: Add line/column information for nodes other than elements. added
Actions #5

Updated by Michael Kay over 6 years ago

  • Status changed from In Progress to Resolved
  • Applies to branch 9.8, trunk added
  • Fix Committed on Branch 9.8, trunk added

For 9.8 I have made changes whose effect is that line and column information is maintained for elements, comments, and processing instructions on both the TinyTree and LinkedTree. For the TinyTree, the location information returned for a text node is that of the most recent node that has location information (usually the parent element); for the LinkedTree, it is always the parent element.

For the development branch I have made additional changes so that in the case of the TinyTree, line and column information is maintained for text nodes. For the LinkedTree, the situation is the same as 9.8.

Actions #6

Updated by O'Neil Delpratt about 6 years ago

  • % Done changed from 0 to 100

Bug fix applied in the Saxon 9.8.0.7 maintenance release.

Actions #7

Updated by O'Neil Delpratt about 6 years ago

  • Status changed from Resolved to Closed
  • Fixed in Maintenance Release 9.8.0.7 added

Please register to edit this issue

Also available in: Atom PDF