Project

Profile

Help

Bug #6406

closed

A collection cannot contain the same document more than once

Added by Norm Tovey-Walsh 8 months ago. Updated 6 months ago.

Status:
Closed
Priority:
Low
Category:
-
Sprint/Milestone:
-
Start date:
2024-04-29
Due date:
% Done:

100%

Estimated time:
Legacy ID:
Applies to branch:
12, trunk
Fix Committed on Branch:
12, trunk
Fixed in Maintenance Release:
Platforms:

Description

The (infernal) constraint that document-uri()s must be unique shouldn't apply when it's the same document. See src/test/java/s9apitest/TestCollections.java#testCollectionDupDoc.

The problem appears to arise from the fact that CollectionFn creates a new TreeInfo for each member of the collection and DocumentPool objects to two different TreeInfo objects with the same URI.

Given that QT4 is planning to remove the document-uri() restriction, it's tempting to add a configuration property that instructs Saxon to ignore this test, but it's unclear what other consequences that might have. And that's not exactly the same problem as the same node appearing in the collection more than once.

Another obvious solution is to maintian a mapping of items to TreeInfo items when the collection is being constructed:

            Map<Item,TreeInfo> infoMapping = new HashMap<>();
            for (Item item; (item = iter.next()) != null; ) {
                if (item instanceof NodeInfo && ((NodeInfo) item).getNodeKind() == Type.DOCUMENT) {
                    String docUri = ((NodeInfo) item).getSystemId();
                    DocumentKey docKey;
                    if (packageData instanceof StylesheetPackage) {
                        StylesheetPackage pack = (StylesheetPackage) packageData;
                        docKey = new DocumentKey(docUri, pack.getPackageName(), pack.getPackageVersion());
                    } else {
                        docKey = new DocumentKey(docUri);
                    }
                    final TreeInfo info;
                    if (item instanceof TreeInfo) {
                        info = (TreeInfo) item;
                    } else if (infoMapping.containsKey(item)) {
                        info = infoMapping.get(item);
                    } else {
                        info = new GenericTreeInfo(controller.getConfiguration(), (NodeInfo) item);
                        infoMapping.put(item, info);
                    }
                    //TreeInfo info = item instanceof TreeInfo ? (TreeInfo) item : new GenericTreeInfo(controller.getConfiguration(), (NodeInfo) item);
                    docPool.add(info, docKey);
                }
            }

But is that the best choice?

Actions #1

Updated by Norm Tovey-Walsh 8 months ago

Review of the specification implies very strongly ("says", but the prose is a bit thick), that duplicates are not allowed in a collection. I think it's fair to say that if a user writes a CollectionFinder, it's their responsibility to assure that duplicates are not added to the collection.

Actions #2

Updated by Michael Kay 8 months ago

I just found the sentence "There is no requirement that any nodes in the result should be in document order, nor is there a requirement that the result should contain no duplicates." (Present in both 3.1 and 4.0)

Actions #3

Updated by Norm Tovey-Walsh 8 months ago

Interesting. I was drawn away from the description of the function itself into the XPath specification searching for the definition of "available collections" where we find:

For every document node D that is in the target of a mapping in available collections, or that is the root of a tree containing such a node, the document-uri property of D must either be absent, or must be a URI U such that available documents contains a mapping from U to D.

As I remarked, that's not especially clear prose. My first interpretation of it was that there had to be a 1:1 mapping from U to D which would seem to rule out duplicates. But perhaps that's too narrow a reading. Perhaps it says no more than if a URI U is in a collection then there must be a corresponding document D in the collection.

Given that fn:collection explicitly allows the possibility of duplicates, I'm back to thinking that a solution like the one I outline above in CollectionFn.java is necessary.

Actions #4

Updated by Michael Kay 8 months ago

I think this is almost right, but when we process a node in a collection, we should not only consider whether it has already been processed as part of the collection, but also whether it has already been processed for a different reason (for example, as part of a different collection).

So I think the line

               } else if (infoMapping.containsKey(item)) {
                        info = infoMapping.get(item);

should probably be testing whether docKey is already in the docPool, and if it is, get the TreeInfo from there. This of course makes the infoMapping table superfluous.

Actions #5

Updated by Norm Tovey-Walsh 8 months ago

  • Status changed from New to Resolved
  • Applies to branch 12, trunk added

Indeed. Fixed by actually looking in the document pool.

Actions #6

Updated by O'Neil Delpratt 6 months ago

  • Status changed from Resolved to Closed
  • % Done changed from 0 to 100
  • Fixed in Maintenance Release 12.5 added

Bug fix applied in the Saxon 12.5 Maintenance release.

Actions #7

Updated by Debbie Lockett 6 months ago

  • Assignee set to Norm Tovey-Walsh
  • Fix Committed on Branch 12, trunk added

Please register to edit this issue

Also available in: Atom PDF