Bug #5691


Copying a processing instruction drops base URI information

Added by Norm Tovey-Walsh 5 months ago. Updated 21 days ago.

Start date:
Due date:
% Done:


Estimated time:
Legacy ID:
Applies to branch:
10, 11, trunk
Fix Committed on Branch:
10, 11, trunk
Fixed in Maintenance Release:
.NET, Java


Outputter.append() is:

    public void append(Item item) throws XPathException {
        append(item, Loc.NONE, ReceiverOption.ALL_NAMESPACES);

That means any item you append loses its location. I have some code that copies a tree, selectively modifying parts of it. I assumed that if I reached a node that I wasn't going to modify, I could simply call append() with node.getUnderlyingNode(). But if I do that, then the location is lost and consequently the base URI of the copied node(s) is lost.

I can work around this by doing the recursive descent and output the nodes encountered. But that seems like unnecessary work.

Files (82.3 KB) Norm Tovey-Walsh, 2022-09-20 10:57
Actions #1

Updated by Norm Tovey-Walsh 5 months ago

  • Applies to branch 10, 11, trunk added
Actions #2

Updated by Michael Kay 5 months ago

This is a complicated area. The location passed to an Outputter is generally the location of the instruction that constructs the node, not the location of the node itself. The copy-of instruction takes special (somewhat devious) measures to ensure that the location of the node being copied is passed to the DocumentBuilder that constructs the copy of the node; in some cases (not all) the new node retains the location of the old node.

Also complicating this is that the location of a node isn't in all cases the same thing as its base URI.

Need more detail on this one before making any changes.

Actions #3

Updated by Norm Tovey-Walsh 5 months ago

That's fair. Here's some context from a Slack chat:

Here's a weird one. I'm processing an XInclude. I construct a tree for the included content by calling various methods on the Receiver. Let's say the content I construct is . The baseURI (which comes from the underlying nodes getSystemId()) for foo is fine, for ?pi? is "", and for bar is "". The presence of the PI makes the following element(s) not have a base URI. I thought this was caused by my failure to supply a base URI for the PI. But I've changed the code so that I do this:

    public void addPI(XdmNode node) {
        Location location = VoidLocation.instance();
        if (node.getBaseURI() != null) {
            location = new SysIdLocation(node.getBaseURI().toString());
        addPI(node.getNodeName().getLocalName(), node.getStringValue(), location);

where the 3 argument form of addPI is:

    public void addPI(String target, String data, Location location) {
        try {
            receiver.processingInstruction(target, StringView.of(data), location, 0);
        } catch (XPathException e) {
            throw new XProcException(e);

The original PI that I'm effectively copying does have a base URI so the SysIdLocation is correct and is being passed to the receiver.

It looks like decompose() in ComplexContentOutputter is copying the PI again and passing a null location, because append in Outputter passes Loc.NONE.

I confess, I don't see where the PI comes into it. It would make more sense if all the elements lost their base URIs.

I'll see if I can extract out a test case.

Actions #4

Updated by Norm Tovey-Walsh 5 months ago

The attached test case demonstrates the problem. (I've done no more to investigate it beyond reproducing it.)

If you run the main class, it will print:

module: file:/Volumes/Saxonica/src/saxonica/test-apps/iss5691/main.xml: file:/Volumes/Saxonica/src/saxonica/test-apps/iss5691/main.xml
  foo: file:/Volumes/Saxonica/src/saxonica/test-apps/iss5691/main.xml: file:/Volumes/Saxonica/src/saxonica/test-apps/iss5691/main.xml
  pi: : 
  bar: : 

For some reason, appending the inner module document loses all base URIs starting at the first PI.

Actions #5

Updated by Michael Kay about 2 months ago

The bug appears to be something like this: when a tree with no known base URI is copied to a tree builder for which a base URI (system ID) has been set, copying element nodes to the tree causes them to acquire the base URI that was set for the tree builder. However, copying processing instruction nodes causes the corresponding nodes in the result tree to have no base URI; moreover, this change persists until the next node with a known base URI is encountered.

The problem is in TinyBuilder line 534, when handling processing instructions:

        String localLocation = locationId.getSystemId();
        tt.setSystemId(nodeNr, localLocation);

The corresponding code for element nodes is at line 321:

        String localSystemId = location.getSystemId();
        if (isUseEventLocation() && localSystemId != null) {
            tt.setSystemId(nodeNr, localSystemId);
        } else if (currentDepth == 1) {
            tt.setSystemId(nodeNr, systemId);

where systemId is the systemId property of the TinyBuilder.

I think the code for processing instructions should be the same as for elements.

Actions #6

Updated by Michael Kay about 2 months ago

  • Subject changed from Appending a node discards any location that it might have had to Copying a processing instruction drops base URI information
  • Category set to Internals
  • Status changed from New to In Progress
  • Assignee set to Michael Kay
Actions #7

Updated by Michael Kay about 2 months ago

The fix seems to cause no obvious regression, but I need to create a specific unit test before I'm 100% comfortable with it.

Actions #8

Updated by Michael Kay about 2 months ago

While attempting to create a unit test for this (so far unsuccessfully), I hit an apparently unrelated problem with the TinyBuilder. My test does the following sequence of calls on a TinyBuilder


After this sequence of calls, getCurrentRoot() returns the BOTTOM element, not the document node.

What happens here is that the documentNode.copy() operation starts with a startDocument() event, and ends with an endDocument() event. The inner startDocument() event is correctly ignored, but the inner endDocument() is not - it decrements the level number.

After exploring this a little, I concluded that it's not a problem because in practice the events are always sent to the TinyBuilder via a ComplexContentOutputter, which filters out the unwanted startDocument() and endDocument() events.

Actions #9

Updated by Michael Kay about 2 months ago

  • Status changed from In Progress to Resolved
  • Priority changed from Low to Normal
  • Fix Committed on Branch 10, 11, trunk added
  • Platforms .NET, Java added

Created a unit test. Fixed and tested on the 10, 11, and 12 branches.

Actions #10

Updated by Community Admin 21 days ago

  • % Done changed from 0 to 100
  • Fixed in Maintenance Release 12.0 added

Bug issue fix applied in the Saxon 12.0 Major Release. Leaving this bug marked as Resolved until fix applied

Please register to edit this issue

Also available in: Atom PDF