collection() does not use built-in copies of W3C DTDs
From Gunther Rademacher on saxon-help:
I am trying to run fn:collection on a set of XHTML documents in the file system, each of them having a DOCTYPE declaration as follows:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> ...
My first approach was to do it like this
This works, but I am having issues with the parser's DTD resolution. When however doing it like this,
it works much better, because the DTD is resolved against a Saxon copy (as indicated when using command line option "-t"). Should not this be done from within the context of fn:collection, too?
BTW, an empty command line for net.sf.saxon.Query prints out
Usage: see http://www.saxonica.com/documentation/html/using-xquery/commandline.html
but the content of that page apparently has moved to
Thanks and best regards,
#1 Updated by Michael Kay over 6 years ago
The documentation issue has been transferred to bug #2274.
On the main issue, I have confirmed that the problem exists. What seems to be happening is that Saxon sets the StandardEntityResolver (which is where the DTD redirection happens) to be the entityResolver in the defaultParseOptions for the Configuration, but the collection() path for some reason is not using the defaultParseOptions, but is creating its own. That suggests two possibilities for a fix: either change collection() to use the defaultParseOptions, or set the StandardEntityResolver in the parseOptions that collection() constructs for itself.
#2 Updated by Michael Kay over 6 years ago
I decided to change the StandardCollectionURIResolver (line 454) to get the default parsing options from the configuration. This may have side-effects, but on the whole I think they will be beneficial: it means that options not explicitly set in the collection URI parameters will be taken from the default parsing options set (for example) on the command line.
The option that gives me most headaches here is schema validation (-val on the command line). The command line documentation is ambivalent: it says it applies to documents read "using document() and similar functions". The documentation for collection URIs (http://www.saxonica.com/documentation/#!sourcedocs/collections) explicitly says that the default for the validation parameter depends on Configuration settings, which is not currently the case. So I'm inclined to be bold here and make the change: incompatible changes can be justified if they bring the code into line with the documentation.
#3 Updated by Michael Kay over 6 years ago
- Status changed from New to Resolved
I have patched the StandardCollectionURIResolver on the 9.6 and 9.7 branches so it takes its default parsing options from the configuration settings. More precisely, it uses the default options from the PipelineConfiguration, which default to the options in the Configuration, but which may be varied on a per-query or per-transformation basis, though it is unusual to change anything other than the URIResolver.
(The Controller has methods getValidationMode() and setValidationMode(), and the validation mode that is set doesn't find its way into the PipelineConfiguration, and therefore won't affect the collection() function. I'll leave that problem for another day.)
Please register to edit this issue