Bug #3543
closedXdmNode.getColumnNumber() is always -1 for text and comment nodes.
100%
Description
I'm building an XML dom using S9API's DocumentBuilder. Configuration.setLineNumbering is set to true.
The resulting dom contains all kinds of nodes. Calling XdmNode.getColumnNumber() on elements, attributes and processing-instructions return an actual value. However, calling in on text and comments always returns -1.
I did some digging in Saxon's source code and in TinyBuilder.java i see that in method processingInstruction() tt.setLineNumber() is called, while in method comment() it is not called. Is that the reason why getColumnNumber() always returns -1 for comment and not for processing instructions?
Related issues
Updated by Michael Kay over 6 years ago
I guess when you say you are building a "dom" you are using the term generically and what you are actually building is a TinyTree.
I think the history here is probably that we only maintained location information on elements and processing instructions because the InfoSet only maintains Base URI for those kinds of nodes.
The line and column information does appear to be available for all kinds of nodes (well, not attributes of course). For text nodes there is the complication of buffering: by the time we know that we have seen all the pieces of the text node, the Locator has been updated to the following (usually) startElement or endElement event. That means we have to make a copy of the location information whenever characters() is called. That's a bit of an overhead - and of course our main reason for keeping location information is for diagnostics on stylesheet and schema errors, where we only use element location anyway.
Updated by Michael Kay over 6 years ago
- Category set to Diagnostics
- Status changed from New to In Progress
- Assignee set to Michael Kay
- Priority changed from Low to Normal
On the development branch I have made this work so that line and column information is retained for elements, comments, PIs, and text nodes in the TinyTree, and I have added a unit test to this effect.
The changes to make it work for text nodes feel a bit risky to include in the 9.8 branch. It involves changing ReceivingContentHandler to save the location of the most recent text node; for performance reasons we should do that only if line numbering is enabled. To make RCH aware that line numbering is enabled it has to look in the ParseOptions of the PipelineConfiguration; but for some reason Configuration.buildDocument() is not setting the parseOptions in the PipelineConfiguration. I have changed it so that it now does so, but this is the kind of change that could easily have unexpected side-effects on other paths. So I'm inclined to leave location information on text nodes to the next release.
Updated by Michael Kay over 6 years ago
For the Linked tree, the data structure used makes it difficult to retain line/column information for any nodes other than elements, so I propose to make no change in this case.
Later, on further reflection: for the Linked tree, the Processing Instruction node currently has dedicated slots for system ID and line number, and it makes sense to add a slot for column number as well. And then it makes sense to add the same three slots for Comment nodes. Text nodes however are much more space-critical, and we can't afford to allocate fixed slots here in cases where they aren't used. So for text nodes, we will continue to return the location information of the parent element (actually, in XSLT stylesheet trees, I think it tends to use the preceding-sibling element if there is one, but that's done above the level of the tree model).
Updated by Michael Kay over 6 years ago
- Has duplicate Feature #1739: Add line/column information for nodes other than elements. added
Updated by Michael Kay over 6 years ago
- Status changed from In Progress to Resolved
- Applies to branch 9.8, trunk added
- Fix Committed on Branch 9.8, trunk added
For 9.8 I have made changes whose effect is that line and column information is maintained for elements, comments, and processing instructions on both the TinyTree and LinkedTree. For the TinyTree, the location information returned for a text node is that of the most recent node that has location information (usually the parent element); for the LinkedTree, it is always the parent element.
For the development branch I have made additional changes so that in the case of the TinyTree, line and column information is maintained for text nodes. For the LinkedTree, the situation is the same as 9.8.
Updated by O'Neil Delpratt about 6 years ago
- % Done changed from 0 to 100
Bug fix applied in the Saxon 9.8.0.7 maintenance release.
Updated by O'Neil Delpratt about 6 years ago
- Status changed from Resolved to Closed
- Fixed in Maintenance Release 9.8.0.7 added
Please register to edit this issue