Project

Profile

Help

Bug #5615

open

Saxon 11 - Cannot have two different documents with the same document-uri

Added by Radu Coravu 4 months ago. Updated 4 months ago.

Status:
New
Priority:
Low
Assignee:
-
Category:
-
Sprint/Milestone:
-
Start date:
2022-07-26
Due date:
% Done:

0%

Estimated time:
Legacy ID:
Applies to branch:
Fix Committed on Branch:
Fixed in Maintenance Release:
Platforms:

Description

I'm getting this unhandled error which comes probably from reusing a transformer, the code we use is quite complex in those parts:

7065  ERROR [ main ] ro.sync.quickfix.QuickFixExecutor - Cannot generate late quick fixes: net.sf.saxon.trans.XPathException: Cannot have two different documents with the same document-uri file:/D:/projects/../SchematronQF/add/element/selectAttrValue/topic.dita
net.sf.saxon.trans.XPathException: Cannot have two different documents with the same document-uri file:/D:/projects/../SchematronQF/add/element/selectAttrValue/topic.dita
	at net.sf.saxon.om.DocumentPool.add(DocumentPool.java:69)
	at net.sf.saxon.Controller.registerDocument(Controller.java:1004)
	at net.sf.saxon.Controller.makeSourceTree(Controller.java:1359)
	at net.sf.saxon.s9api.XsltTransformer.transform(XsltTransformer.java:343)
	at net.sf.saxon.jaxp.TransformerImpl.transform(TransformerImpl.java:75)

would it be a good idea in the Controller.registerDocument to check if the document is already in the pool before adding it?

          if (getDocumentPool().find(uri) == null) {
            sourceDocumentPool.add(doc, uri);
          }

Files

asserts.zip (1.16 KB) asserts.zip Radu Coravu, 2022-08-16 11:01
Actions #1

Updated by Michael Kay 4 months ago

Controller.registerDocument() is called from two places: from DocumentFn (implementing doc() and document()) and from Controller.makeSourceTree() which handles the "primary input" to a transformation (to the extent that's still a meaningful concept). The main reason for putting the primary input in the document pool is so that it isn't parsed again if the transformation then uses doc() to access it. Yes, I guess it wouldn't do any harm if we find the document URI is already in use to just skip that: it just means anyone who tries to use document-uri() on the document is going to wonder why it hasn't got one.

Actions #2

Updated by Radu Coravu 4 months ago

Ok, I do not understand this part of what you are saying:

it just means anyone who tries to use document-uri() on the document is going to wonder why it hasn't got one.

so there's a side effect to checking if the document is already in the pool when Controller.registerDocument is called?

Actions #3

Updated by Michael Kay 4 months ago

document-uri() gives a result only for a document that's in the pool. That's because returning a document-uri() UUU for a document provides a guarantee that doc('UUU') will return that document; and that's why two documents aren't allowed to have the same document URI.

Actions #4

Updated by Radu Coravu 4 months ago

So if the XSLT uses document-uri on a node from the current XML source it would not return anything? Might be a problem. How about auto discard a document if already in the pool and the Controller.registerDocument is called?


          TreeInfo cachedDoc = getDocumentPool().find(uri);
          if(cachedDoc != null) {
            getDocumentPool().discard(cachedDoc);
          }
          sourceDocumentPool.add(doc, uri);
        
Actions #5

Updated by Michael Kay 4 months ago

We can't do anything that would make the data mutable - the result of document-uri() applied to a node can't change over time.

The problems with this started with fn:transform(), which muddies the scope rules for things like the immutability of the result of doc().

Actions #6

Updated by Radu Coravu 4 months ago

I'm not seeing the entire picture here, you can choose to do what's best, maybe even not fix this issue if it looks like something few people would run into or that we could fix by discarding all documents in the pool when reusing the transformer. How about if in the method "net.sf.saxon.Controller.makeSourceTree(Source, int)" if there is already a DocumentKey for that source in the pool we no longer create a new document from it and just use the document in the pool?

Actions #7

Updated by Radu Coravu 4 months ago

Looking more at what we are doing on our side to cause this problem, we are reusing a Transformer and give it each time a javax.xml.transform.Source which has the same system ID but has a reader with different contents each time. So on our side we can clear the pool every time after transforming. In general on the method "net.sf.saxon.Controller.makeSourceTree(Source, int)" creating a DocumentKey over the "new DocumentKey(source.getSystemId())" will not capture the fact that the source contents (reader or input stream or parser) may be different than what's currently in the pool so in my opinion it would make sense to remove the key from the pool if it exists before adding it.

Actions #8

Updated by Michael Kay 4 months ago

Is there any good reason to reuse the Transformer rather than creating a new one? I usually recommend creating a new Transformer for every transformation.

Actions #9

Updated by Radu Coravu 4 months ago

I think we might reuse the transformer usually because it has a pre-compiled stylesheet. Other projects like the DITA Open Toolkit may reuse the transformer because it has a document pool and has there are lots of DITA topics, each with links to various targets which may be the same file, once the DITA OT loads a target file using document() it's useful that for another processed topic which refers to the same target, the target file is not loaded again.

Actions #10

Updated by Radu Coravu 4 months ago

One more problem related to this, if we enable this feature: net.sf.saxon.lib.FeatureKeys.ASSERTIONS_CAN_SEE_COMMENTS and publish a simple XML document with an XSLT stylesheet I will alwaysget this error reported by Saxon:

net.sf.saxon.trans.XPathException: Cannot have two different documents with the same document-uri file:/D:/projects/eXml_Saxon11.3/test/EXM-24141/assert.xml
	at net.sf.saxon.om.DocumentPool.add(DocumentPool.java:71)
	at net.sf.saxon.Controller.registerDocument(Controller.java:972)
	at net.sf.saxon.Controller.makeSourceTree(Controller.java:1327)
	at net.sf.saxon.s9api.XsltTransformer.transform(XsltTransformer.java:345)
	at net.sf.saxon.jaxp.TransformerImpl.transform(TransformerImpl.java:75)

because the XML document is first parsed and added to the pool in the assertion related packages:

Add file:/D:/projects/.../assert.xml
java.lang.Exception
	at net.sf.saxon.om.DocumentPool.add(DocumentPool.java:68)
	at net.sf.saxon.sxpath.XPathDynamicContext.setContextItem(XPathDynamicContext.java:79)
	at net.sf.saxon.sxpath.XPathExpression.createDynamicContext(XPathExpression.java:145)
	at com.saxonica.ee.schema.Assertion.testComplex(Assertion.java:252)
	at com.saxonica.ee.validate.ValidationStack.testAssertion(ValidationStack.java:595)
	at com.saxonica.ee.validate.ValidationStack.testAssertions(ValidationStack.java:589)
	at com.saxonica.ee.validate.ValidationStack.endElement(ValidationStack.java:532)
	at net.sf.saxon.event.ProxyReceiver.endElement(ProxyReceiver.java:149)
	at net.sf.saxon.event.ProxyReceiver.endElement(ProxyReceiver.java:149)
	at com.saxonica.ee.validate.AttributeInheritor.endElement(AttributeInheritor.java:63)
	at net.sf.saxon.event.PathMaintainer.endElement(PathMaintainer.java:59)
	at net.sf.saxon.event.DocumentValidator.endElement(DocumentValidator.java:79)
	at net.sf.saxon.event.ReceivingContentHandler.endElement(ReceivingContentHandler.java:609)
	at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source)

Please register to edit this issue

Also available in: Atom PDF