Maintenance: Planio will be observing a scheduled maintenance window this Tuesday, November 5, 2024 from 03:00 UTC until 06:30 UTC to perform urgent network maintenance in our primary data center. Your Planio account will be unavailable during this maintenance window.
Bug #6406
closedA collection cannot contain the same document more than once
100%
Description
The (infernal) constraint that document-uri()
s must be unique shouldn't apply when it's the same document. See src/test/java/s9apitest/TestCollections.java#testCollectionDupDoc
.
The problem appears to arise from the fact that CollectionFn
creates a new TreeInfo
for each member of the collection and DocumentPool
objects to two different TreeInfo
objects with the same URI.
Given that QT4 is planning to remove the document-uri()
restriction, it's tempting to add a configuration property that instructs Saxon to ignore this test, but it's unclear what other consequences that might have. And that's not exactly the same problem as the same node appearing in the collection more than once.
Another obvious solution is to maintian a mapping of items to TreeInfo
items when the collection is being constructed:
Map<Item,TreeInfo> infoMapping = new HashMap<>();
for (Item item; (item = iter.next()) != null; ) {
if (item instanceof NodeInfo && ((NodeInfo) item).getNodeKind() == Type.DOCUMENT) {
String docUri = ((NodeInfo) item).getSystemId();
DocumentKey docKey;
if (packageData instanceof StylesheetPackage) {
StylesheetPackage pack = (StylesheetPackage) packageData;
docKey = new DocumentKey(docUri, pack.getPackageName(), pack.getPackageVersion());
} else {
docKey = new DocumentKey(docUri);
}
final TreeInfo info;
if (item instanceof TreeInfo) {
info = (TreeInfo) item;
} else if (infoMapping.containsKey(item)) {
info = infoMapping.get(item);
} else {
info = new GenericTreeInfo(controller.getConfiguration(), (NodeInfo) item);
infoMapping.put(item, info);
}
//TreeInfo info = item instanceof TreeInfo ? (TreeInfo) item : new GenericTreeInfo(controller.getConfiguration(), (NodeInfo) item);
docPool.add(info, docKey);
}
}
But is that the best choice?
Updated by Norm Tovey-Walsh 6 months ago
Review of the specification implies very strongly ("says", but the prose is a bit thick), that duplicates are not allowed in a collection. I think it's fair to say that if a user writes a CollectionFinder
, it's their responsibility to assure that duplicates are not added to the collection.
Updated by Michael Kay 6 months ago
I just found the sentence "There is no requirement that any nodes in the result should be in document order, nor is there a requirement that the result should contain no duplicates." (Present in both 3.1 and 4.0)
Updated by Norm Tovey-Walsh 6 months ago
Interesting. I was drawn away from the description of the function itself into the XPath specification searching for the definition of "available collections" where we find:
For every document node D that is in the target of a mapping in available collections, or that is the root of a tree containing such a node, the document-uri property of D must either be absent, or must be a URI U such that available documents contains a mapping from U to D.
As I remarked, that's not especially clear prose. My first interpretation of it was that there had to be a 1:1 mapping from U
to D
which would seem to rule out duplicates. But perhaps that's too narrow a reading. Perhaps it says no more than if a URI U
is in a collection then there must be a corresponding document D
in the collection.
Given that fn:collection
explicitly allows the possibility of duplicates, I'm back to thinking that a solution like the one I outline above in CollectionFn.java
is necessary.
Updated by Michael Kay 6 months ago
I think this is almost right, but when we process a node in a collection, we should not only consider whether it has already been processed as part of the collection, but also whether it has already been processed for a different reason (for example, as part of a different collection).
So I think the line
} else if (infoMapping.containsKey(item)) {
info = infoMapping.get(item);
should probably be testing whether docKey
is already in the docPool
, and if it is, get the TreeInfo
from there. This of course makes the infoMapping table superfluous.
Updated by Norm Tovey-Walsh 6 months ago
- Status changed from New to Resolved
- Applies to branch 12, trunk added
Indeed. Fixed by actually looking in the document pool.
Updated by O'Neil Delpratt 4 months ago
- Status changed from Resolved to Closed
- % Done changed from 0 to 100
- Fixed in Maintenance Release 12.5 added
Bug fix applied in the Saxon 12.5 Maintenance release.
Updated by Debbie Lockett 4 months ago
- Assignee set to Norm Tovey-Walsh
- Fix Committed on Branch 12, trunk added
Please register to edit this issue