Project

Profile

Help

Bug #3530

closed

Base URI of an element introduced using xi:include

Added by Michael Kay over 6 years ago. Updated over 4 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
XPath conformance
Sprint/Milestone:
-
Start date:
2017-11-15
Due date:
% Done:

100%

Estimated time:
Legacy ID:
Applies to branch:
9.8, trunk
Fix Committed on Branch:
trunk
Fixed in Maintenance Release:
Platforms:

Description

Reported by Patrik Stellmann today on the Saxon help list (SourceForge):

I’ve created following test scenario and run it with saxon EE 9.6.0.7 (from within oXygen 18.1) on windows

root.xml:

<root xmlns:xi="http://www.w3.org/2001/XInclude">
            <xi:include href="subfolder/child.xml"/>
</root>

subfolder/child.xml:

<child/>

xsl:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="3.0" expand-text="yes">
            <xsl:template match="*">
                        <xsl:message>{name(.)}: {base-uri(.)}</xsl:message>
                        <xsl:next-match/>
            </xsl:template>
</xsl:stylesheet>

output:

root: file:/[…]/base-uri-test/root.xml
child: file:/[…]/base-uri-test/subfolder/subfolder/child.xml

So “subfolder” is duplicated in the URI. I get the same result wen calling the java method NodeInfo.getBaseURI().

Actions #1

Updated by Michael Kay over 6 years ago

It seems that the XML parser (or the XInclude processor to be more precise) is reporting the location/systemId of the child element as file:/x/test/subfolder/child.xml, and is also giving it an @xml-base attribute of subfolder/child.xml.

The XML Base spec says:

The base URI for a URI reference appearing in an xml:base attribute is the base URI of the parent element of the element bearing the xml:base attribute, if one exists within the document entity or external entity, otherwise the base URI of the document entity or external entity containing the element.

But I think Saxon is resolving against "what the base URI of the element bearing the xml:base attribute would be in the absence of the xml:base attribute".

Actions #2

Updated by Michael Kay over 6 years ago

  • Description updated (diff)
Actions #3

Updated by Michael Kay over 6 years ago

  • Status changed from New to In Progress

I can make this work correctly by changing Navigator.getBaseURI(), replacing the line

URI base = new URI(startSystemId.equals(parentSystemId) ? parent.getBaseURI() : startSystemId);

by

URI base = new URI(parent.getBaseURI());

However, it seems highly likely this will cause something else to break...

Actions #4

Updated by Michael Kay over 6 years ago

The code that is replaced by this change is designed to implement the rule: "the base URI of the parent element of the element bearing the xml:base attribute, if one exists within the document entity or external entity".

If an external entity reference in a document is expanded to produce a new element, the only way we know that the element had no parent within the external entity is that the systemId of the element is different from the systemId of the parent element. In this case we are supposed to resolve against the systemId of the external entity.

The problem is, that when the XInclude processor puts an xml:base attribute on the generated element, it is expecting it to be resolved against the base URI of the new parent element, not against the systemId of the included entity. And Saxon can't easily distinguish the two cases.

I'm going to raise a question on xml-dev about this to see if any of the XML gurus have a view on it.

Here's the message I have posted:

Patrik Stellman raised a problem on the saxon-help list for which I would appreciate advice.

When an external entity is expanded, and the entity in question contains an element with an xml-base attribute, the value of the @xml:base attribute is supposed to be resolved against the base URI of the external entity itself (not against the base URI of the element into which the entity's expansion is grafted).

But when xi:include is processed, the xi:include processor injects an @xml:base attribute which is intended to be resolved against the base URI of the "include parent" (that is, the parent of the xi:include element).

Saxon, as receiver of events notified by the XML parser, is interpreting the two situations in the same way. If the systemId of an element is different from the systemId of its parent, it assumes that the child element was produced by entity expansion, and that the @xml-base attribute should therefore be resolved relative to the systemId of the child element. But this ignores the possibility that the child element was actually produced by XInclude expansion: this will also cause the child element and parent element to have different SystemIds (as notified by a SAX parser), but this time the @xml-base attribute should be resolved against the base URI of the (new) parent element.

Can anyone suggest a way in which Saxon, as receiver of SAX events, can distinguish the two cases and interpret them both correctly?

Actions #5

Updated by Michael Kay over 6 years ago

On the development branch I have implemented a solution (currently for the TinyTree only) that successfully distinguishes the external-entity and XInclude cases, by taking account of the startEntity() and endEntity() calls from the SAX Parser to the LexicalHandler.

This is somewhat disruptive as doing it properly involves extensions to the NodeInfo interface, so my current plan is not to do it on the 9.8 branch.

Actions #6

Updated by Michael Kay about 6 years ago

  • Status changed from In Progress to Resolved
  • Applies to branch 9.8, trunk added
  • Fix Committed on Branch trunk added

Closing this with no further action. The fix for the development branch was complex and I think it is too risky to include on the 9.8 branch, which is now being managed for stability.

The relevant test case, base-uri-052, has been added to the 9.8 exceptions list as a "known failure".

Actions #7

Updated by O'Neil Delpratt over 5 years ago

  • Status changed from Resolved to Closed
  • % Done changed from 0 to 100
  • Fixed in Maintenance Release 9.9.0.1 added

Bug fix applied in the Saxon 9.9.0.1 major release.

Actions #8

Updated by Nico Kutscherauer over 4 years ago

Hi,

I have now a similar issue with nested XIncludes using Saxon-HE-9.9.1-4.jar (from commandline).

root.xml:

<root xmlns:xi="http://www.w3.org/2001/XInclude">
    <xi:include href="subfolder/child.xml"/>
</root>

subfolder/child.xml:

<child xmlns:xi="http://www.w3.org/2001/XInclude">
    <xi:include href="otherchild.xml"/>
</child>

subfolder/otherchild.xml:

<otherchild/>

xsl: same as Patrick's.

Output:

root: file:/[...]/root.xml
child: file:/[...]/subfolder/child.xml
otherchild: file:/[...]/subfolder/subfolder/otherchild.xml

I tested also with Saxon-HE-9.8.0-12.jar on commandline with the same result.

Calling the HE-9.8.0-12 inside of Oxygen 20.1 I get:

root: file:/[...]/root.xml
child: file:/[...]/subfolder/subfolder/child.xml
otherchild: file:/[...]/subfolder/otherchild.xml

Strange...

Thanks & Best Regards,

Nico

Actions #9

Updated by Michael Kay over 4 years ago

  • Description updated (diff)

I've moved the new issue to a new thread: see bug #4281. Please follow progress there.

Please register to edit this issue

Also available in: Atom PDF