Project

Profile

Help

Bug #4837

closed

Two documents can have the same URI

Added by Michael Kay almost 4 years ago. Updated about 3 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
JAXP Java API
Sprint/Milestone:
-
Start date:
2020-11-25
Due date:
% Done:

100%

Estimated time:
Legacy ID:
Applies to branch:
10, trunk
Fix Committed on Branch:
trunk
Fixed in Maintenance Release:
Platforms:

Description

When a document comes into Saxon from outside, typically as a Source object, we treat its SystemId property as representing its base URI. This enables relative references within the document to be correctly resolved, for example by the doc() and document() functions. But documents read using doc() and document() are required to have the property that two requests using the same absolute URI resolve to the same document. This can't work if the external input to a transformation supplies two different documents with the same SystemId property.

In a bug raised by Ihe Onwuka on the saxon-help list, this is happening on a call to fn:transform(), and it is happening as a result of rules in the XSLT specification. The XSLT spec says that xsl:copy (and various other instructions) create a document whose base URI is the same as that of the stylesheet. So we immediately have two documents with the same base URI, which (using fn:transform() or otherwise) can easily be used as input to another transformation. In this example the stylesheet is using doc("") (a "same document reference") to access the document identified by its base URI, and if two documents have the same base URI there is no way this can work. (It runs directly counter to the rule that two calls on doc() supplying the same URI must return the same document).

There are of course many complications here. For example, what do we do when doc(X) supplies one particular URI, and the returned document has a systemId property that is different? We addressed that question in bug #4795, and it's possible that the changes we made there exacerbated the problem we're seeing here. But both bugs have the same underlying cause: the whole architecture is making an assumption that there's a one-to-one correspondence between URIs and documents, and when the going gets rough that clearly isn't the case.

I think one immediate action we can take is that we should detect, when we put a document in the document pool, that it already contains a different document with the same URI. That will at least give earlier detection of the problem, and clearer diagnostics.

Perhaps we should also be clearer about the distinction between document URI and base URI. (The 3.1 specs have both properties, in recognition of the fact that here be dragons). When we create a document using xsl:copy or similar instructiions, perhaps we should carefully mark it to indicate that the base URI we allocate can be used for resolving relative URIs appearing in the document, but it must not be used as a unique identifier for the document (that is, as a document URI, or as a same-document reference). That's not easy because we use the JAXP interfaces such as Source and URIResolver so heavily, and these interfaces make no such distinctions.

Please register to edit this issue

Also available in: Atom PDF