Project

Profile

Help

Bug #4618

More struggles with base URIs and receivers

Added by Norman Tovey-Walsh 16 days ago. Updated 15 days ago.

Status:
New
Priority:
Low
Assignee:
-
Category:
-
Sprint/Milestone:
-
Start date:
2020-06-27
Due date:
% Done:

0%

Estimated time:
Legacy ID:
Applies to branch:
10
Fix Committed on Branch:
Fixed in Maintenance Release:

Description

Hi Mike,

I think we've had a couple of conversations on the mailing list around this topic and it's come up again.

I've been trying to clean up my XInclude implementation a bit. If I get an XInclude element that points to a document with a fragment identifier, I end up reaching into the target document and pulling a few nodes out. These nodes get inserted into the source document where the xi:include element had been. (Not telling you anything you don't already know!)

Where they're inserted, I need to do xml:base fixup, adding explicit xml:base attributes so that the base URIs are preserved even as they're inserted into a new document.

Trouble is, the base URI keeps falling on the floor. Maybe I'm just abusing the receiver in some unsupported manner. I extracted it all out into a single test case that I think reproduces the core issue.

The base URIs are present in the original document, but not on the inserted nodes. At one point, I thought maybe the fact that I was appending nodes was causing the base URI to get lost. I added a an attempt to set it on the wrapper element and that doesn't get a base URI either.

A base URI set on the destination propagates through, but that's not sufficient because the base URI may need to vary between elements.

Am I doing something wrong? Is there any hope for making this work?

Main.java (3.86 KB) Main.java Norman Tovey-Walsh, 2020-06-27 16:52
Main.java (3.7 KB) Main.java Norman Tovey-Walsh, 2020-06-28 19:06
Main.java (4.22 KB) Main.java Norman Tovey-Walsh, 2020-06-28 19:54

History

#1 Updated by Michael Kay 15 days ago

This whole area is certainly delicate and fragile.

I think that in general, Receiver.setSystemId() must be called before startDocument(), and it applies to the document as a whole. To set different base URIs on different elements, the mechanism is to supply this in the location property on the startElement call. But it seems that setSystemId() is needed at the document level as well. The following seems to work after a fashion:

        Receiver receiver = destination.getReceiver(pipe, new SerializationProperties());
        try {
            receiver.setSystemId("http://example.com/");
            receiver.open();
            receiver.startDocument(0);
            FingerprintedQName wrapper = new FingerprintedQName("", "", "wrapper");
            Location loc = new Loc("http://example.com/", -1, -1);
            receiver.startElement(wrapper, doc.getUnderlyingNode().getSchemaType(), EmptyAttributeMap.getInstance(),
                                  doc.getUnderlyingNode().getAllNamespaces(), loc, 0);
            while (iter.hasNext()) {
                XdmNode node = iter.next();
                if (node.getNodeKind() == XdmNodeKind.ELEMENT) {
                    loc = new Loc("http://bar.com/", -1, -1);
                    receiver.append(node.getUnderlyingNode(), loc, 0);
                } else {
                    receiver.append(node.getUnderlyingNode());
                }
            }
            receiver.endElement();
            receiver.endDocument();
            receiver.close();

#2 Updated by Norman Tovey-Walsh 15 days ago

Almost. Well enough, I think, to let me finish the XInclude implementation, but the "loc" supplied to receiver.startElement() doesn't seem to be having any effect. In the example you posted above, the wrapper element is getting the base URI from receiver.setSystemId(), I think. New sample program attached. On my system, it prints:

==============
document:
E:doc: http://foo.com/
E:p1: http://foo.com/
T:text: http://foo.com/
E:p2: http://base.com/
E:p4: http://base.com/
E:p3: http://foo.com/
E:p4: http://foo.com/
==============
newdoc:
E:wrapper2: 
E:p1: http://bar.com/
T:text: 
E:p2: http://base.com/
E:p4: http://base.com/
E:p3: http://bar.com/
E:p4: http://bar.com/

#3 Updated by Norman Tovey-Walsh 15 days ago

It also seems to be the case that setting the location in receiver.append() only works if you also call receiver.setSystemId() before you open the receiver. In this version of Main.java, I get:

==============
document:
D: http://foo.com/
E: doc: http://foo.com/
E: p1: http://foo.com/
T: text: http://foo.com/
E: p2: http://base.com/
E: p4: http://base.com/
E: p3: http://foo.com/
E: p4: http://foo.com/
==============
newdoc:
D: 
E: p1: 
T: text: 
E: p2: http://base.com/
E: p4: http://base.com/
E: p3: 
E: p4: 

But if I uncomment the call to set the system ID on line 47, I get:

==============
newdoc:
D: http://receiver.com/
E: p1: http://bar.com/
T: text: http://receiver.com/
E: p2: http://base.com/
E: p4: http://base.com/
E: p3: http://bar.com/
E: p4: http://bar.com/

#4 Updated by Norman Tovey-Walsh 15 days ago

Note also that setting the base URI didn't work on the text node. I'm not sure it matters but it would matter if that didn't work on a processing instruction.

Please register to edit this issue

Also available in: Atom PDF