Project

Profile

Help

Bug #4672

Problem with XML catalogue behaviour

Added by Ken Holman about 2 months ago. Updated about 1 month ago.

Status:
Closed
Priority:
Low
Assignee:
Category:
-
Sprint/Milestone:
-
Start date:
2020-08-06
Due date:
% Done:

0%

Estimated time:
Legacy ID:
Applies to branch:
Fix Committed on Branch:
Fixed in Maintenance Release:

Description

Hi, folks! I've distilled a bunch of my environment processing into the attached abbreviated test files.

In the transcript below I'm demonstrating how saxon does not resolve an XML catalog entry that is resolved by jing.

There are two test files, one with a DOCTYPE and one without. I invoke my processes first for the one without a DOCTYPE to prove to myself that my invocations are sound.

Next I run jing on the one with a DOCTYPE but not using an XML catalog and I get an error as expected.

Next I run jing on the one with a DOCTYPE and using an XML catalog and I get no error as expected.

Next I run saxon on the one with a DOCTYPE and using an XML catalog and I get an error that is not expected.

The ZIP file has all of my test files of this test.

Am I missing something obvious? My intuition is that if the XML catalog satisfies jing then it also should satisfy saxon. I'm even using the same resolver jar. I looked for guidance in https://www.saxonica.com/html/documentation/sourcedocs/xml-catalogs.html and found nothing to help me. Saxon wouldn't support the catalog option without the reference to the resolver jar.

Thanks for your help with this! And with your patience with me if it is something obvious that I'm missing.

. . . . . . Ken

~/t/testcatalogue $ sh test.sh
+ echo jing XML with no DOCTYPE '(expect' no 'error)'
jing XML with no DOCTYPE (expect no error)
+ java -cp jar/jing/bin/resolver.jar:jar/jing/bin/jing.jar com.thaiopensource.relaxng.util.Driver -C dtd/catalog.xml rng-NISOSTS/RNG-NISO-STS-interchange-1-mathml3/NISO-STS-interchange-1-mathml3.rng SD-1048-catalog/NEN_2eOntw_nl-no-doctype.xml
+ echo Returns 0
Returns 0
+ echo saxon XML with no DOCTYPE '(expect' no 'error)'
saxon XML with no DOCTYPE (expect no error)
+ java -cp jar/jing/bin/resolver.jar:jar/saxon9he/saxon9he.jar net.sf.saxon.Transform -catalog:dtd/catalog.xml -xsl:setare/Setare-STS-validation.xsl -s:SD-1048-catalog/NEN_2eOntw_nl-no-doctype.xml
+ echo Returns 0
Returns 0
+ echo jing XML with no catalog '(expect' 'error)'
jing XML with no catalog (expect error)
+ java -cp jar/jing/bin/resolver.jar:jar/jing/bin/jing.jar com.thaiopensource.relaxng.util.Driver rng-NISOSTS/RNG-NISO-STS-interchange-1-mathml3/NISO-STS-interchange-1-mathml3.rng SD-1048-catalog/NEN_2eOntw_nl.xml
fatal: file not found: /Users/admin/t/testcatalogue/SD-1048-catalog/NISO-STS-interchange-1-mathml3.dtd (No such file or directory)
+ echo Returns 1
Returns 1
+ echo jing XML with no catalog '(expect' no 'error)'
jing XML with no catalog (expect no error)
+ java -cp jar/jing/bin/resolver.jar:jar/jing/bin/jing.jar com.thaiopensource.relaxng.util.Driver -C dtd/catalog.xml rng-NISOSTS/RNG-NISO-STS-interchange-1-mathml3/NISO-STS-interchange-1-mathml3.rng SD-1048-catalog/NEN_2eOntw_nl.xml
+ echo Returns 0
Returns 0
+ echo saxon XML with no catalog '(expect' no 'error)'
saxon XML with no catalog (expect no error)
+ java -cp jar/jing/bin/resolver.jar:jar/saxon9he/saxon9he.jar net.sf.saxon.Transform -catalog:dtd/catalog.xml -xsl:setare/Setare-STS-validation.xsl -s:SD-1048-catalog/NEN_2eOntw_nl.xml
I/O error reported by XML parser processing file:/Users/admin/t/testcatalogue/SD-1048-catalog/NEN_2eOntw_nl.xml: /Users/admin/t/testcatalogue/SD-1048-catalog/NISO-STS-interchange-1-mathml3.dtd (No such file or directory)
+ echo Returns 2
Returns 2
~/t/testcatalogue $ 

testcatalogue.zip (8.47 MB) testcatalogue.zip Ken Holman, 2020-08-06 02:02

History

#1 Updated by Ken Holman about 2 months ago

I see that my copy/paste got out of hand and in the last test I say "saxon XML with no catalog" and it should be "saxon XML with catalog". This is where I'm expecting it to work and it does not. When I try without the resolver.jar, I'm told that the resolver library is missing. This is a transcript with the corrected comment and illustration of no use of a resolver jar:

+ echo saxon XML with catalog '(expect' no 'error)'
saxon XML with catalog (expect no error)
+ java -cp jar/jing/bin/resolver.jar:jar/saxon9he/saxon9he.jar net.sf.saxon.Transform -catalog:dtd/catalog.xml -xsl:setare/Setare-STS-validation.xsl -s:SD-1048-catalog/NEN_2eOntw_nl.xml
I/O error reported by XML parser processing file:/Users/admin/t/testcatalogue/SD-1048-catalog/NEN_2eOntw_nl.xml: /Users/admin/t/testcatalogue/SD-1048-catalog/NISO-STS-interchange-1-mathml3.dtd (No such file or directory)
+ echo Returns 2
Returns 2
+ echo saxon without resolver
saxon without resolver
+ java -cp jar/saxon9he/saxon9he.jar net.sf.saxon.Transform -catalog:dtd/catalog.xml -xsl:setare/Setare-STS-validation.xsl -s:SD-1048-catalog/NEN_2eOntw_nl.xml
Transformation failed: Failed to load Apache catalog resolver library
+ echo Returns 2
Returns 2

#2 Updated by Michael Kay about 2 months ago

I tried this first on 10.1 and even the first case fails. Looking carefully at your code, you appear to be using 9.x, so I'll try it now on 9.9 and come back to 10.1 later.

#3 Updated by Michael Kay about 2 months ago

It's failing on 9.9 too. With -t I get

Loading catalog: file:/Users/mike/bugs/2020/4672-Holman/test/dtd/catalog.xml
Saxon-EE 9.9.1.7J from Saxonica
Java version 1.8.0_121
Using license serial number V008779
Stylesheet compilation time: 537.171647ms
Processing file:/Users/mike/bugs/2020/4672-Holman/test/SD-1048-catalog/NEN_2eOntw_nl.xml
Using parser org.apache.xml.resolver.tools.ResolvingXMLReader
Building tree for file:/Users/mike/bugs/2020/4672-Holman/test/SD-1048-catalog/NEN_2eOntw_nl.xml using class net.sf.saxon.tree.tiny.TinyBuilder
I/O error reported by XML parser processing file:/Users/mike/bugs/2020/4672-Holman/test/SD-1048-catalog/NEN_2eOntw_nl.xml: /Users/mike/bugs/2020/4672-Holman/test/SD-1048-catalog/NISO-STS-interchange-1-mathml3.dtd (No such file or directory)

It looks like there has been no attempt to resolve the reference. Unfortunately here I'm debugging into the XML parser and the catalog resolver, i.e. non-Saxon code which I don't claim to understand.

#4 Updated by Michael Kay about 2 months ago

As far as I can see, running this in the debugger, the XML parser is supplying the absolute URI of the required DTD to the catalog resolver, and the catalog resolver is expecting to see the relative URI as it appears in the DOCTYPE declaration (and in the catalog), and it therefore isn't finding a match. At this point I'm rather at a loss, because I have no idea how this is supposed to work...

The SAX specification does say (for resolveEntity()) "If the system identifier is a URL, the SAX parser must resolve it fully before reporting it to the application." - which does suggest that an absolute URI is expected here.

I wonder if there are different versions of catalog resolvers that we need to be aware of here?

#5 Updated by Ken Holman about 2 months ago

As a user of a catalogue, I would never know the absolute URI of the all of the client XML documents. A single entry in the catalogue would not be able to translate the SYSTEM ID found in XML documents from multiple directories. When I create the catalogue in my "system area" of my application, I can point to it reliably and find in it SYSTEM ID strings that are matched to the SYSTEM ID strings found in the "client area" under client control.

That Jing is successfully getting the URI translated indicates to me that a catalogue is expecting the SYSTEM ID as authored, not as resolved. Jing gives the resolver just the string and my catalogue, never knowing the directories, indicates the string that is to be translated.

https://www.oasis-open.org/committees/entity/spec-2001-08-06.html#s.system: "A system entry matches a system identifier if the normalized value (Section 6.3) of the system identifier is lexically identical to the normalized value of the systemId attribute of the entry." ... I don't believe that makes reference to the resolved URI of the source document.

And it would explain why it works in Jing and not in Saxon.

#6 Updated by Michael Kay about 2 months ago

The problem is, I don't think Saxon is absolutizing the URI, I think the XML parser is doing it.

#7 Updated by Michael Kay about 2 months ago

I've been stepping through what Xerces and the Apache catalog resolver are doing here, and I'm afraid I don't really understand how it's supposed to work. The system ID that Xerces passes to the catalog resolver's resolveEntity() method is an absolute URI (as the spec for resolveEntity() suggests that it should be) and the catalog resolver is comparing this absolute URI against the URI entries in the catalog literally, without absolutisation, so it doesn't find a match. It's clear why it isn't working, but I don't really understand how it's supposed to work. Asking for advice on the XML slack channel. Note, there's no Saxon code involved here.

#8 Updated by Ken Holman about 2 months ago

Gerrit's suggested workaround appears to give the same results for both Jing and Saxon, so I can live with the workaround. It even suggests that Jing may be the comparison one at fault.

You may close this ticket. I don't want to take any more of your time and will live with the workaround. I appreciate the effort you've expended on this, thank you.

#9 Updated by Michael Kay about 1 month ago

  • Status changed from New to Closed
  • Assignee set to Michael Kay

Closing this. To use Liam's phrase, it appears to be "broken by design". The fact that Xerces absolutizes the URI before passing it to the EntityResolver is unfortunate, but it's not something we have the ability to change.

Please register to edit this issue

Also available in: Atom PDF