Project

Profile

Help

Bug #2273

collection() does not use built-in copies of W3C DTDs

Added by Michael Kay over 6 years ago. Updated over 5 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
Performance
Sprint/Milestone:
Start date:
2014-12-22
Due date:
% Done:

100%

Estimated time:
Legacy ID:
Applies to branch:
9.6
Fix Committed on Branch:
9.6
Fixed in Maintenance Release:

Description

From Gunther Rademacher on saxon-help:

I am trying to run fn:collection on a set of XHTML documents in the file system, each of them having a DOCTYPE declaration as follows:

<!DOCTYPE html

  PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">

...

My first approach was to do it like this

collection(".?select=*.xhtml")

This works, but I am having issues with the parser's DTD resolution. When however doing it like this,

collection(".?select=*.xhtml;unparsed=yes")/doc(base-uri(.))

it works much better, because the DTD is resolved against a Saxon copy (as indicated when using command line option "-t"). Should not this be done from within the context of fn:collection, too?

BTW, an empty command line for net.sf.saxon.Query prints out

Usage: see http://www.saxonica.com/documentation/html/using-xquery/commandline.html

but the content of that page apparently has moved to

http://www.saxonica.com/documentation/#!using-xquery/commandline

Thanks and best regards,

Gunther

History

#1 Updated by Michael Kay over 6 years ago

The documentation issue has been transferred to bug #2274.

On the main issue, I have confirmed that the problem exists. What seems to be happening is that Saxon sets the StandardEntityResolver (which is where the DTD redirection happens) to be the entityResolver in the defaultParseOptions for the Configuration, but the collection() path for some reason is not using the defaultParseOptions, but is creating its own. That suggests two possibilities for a fix: either change collection() to use the defaultParseOptions, or set the StandardEntityResolver in the parseOptions that collection() constructs for itself.

#2 Updated by Michael Kay over 6 years ago

I decided to change the StandardCollectionURIResolver (line 454) to get the default parsing options from the configuration. This may have side-effects, but on the whole I think they will be beneficial: it means that options not explicitly set in the collection URI parameters will be taken from the default parsing options set (for example) on the command line.

The option that gives me most headaches here is schema validation (-val on the command line). The command line documentation is ambivalent: it says it applies to documents read "using document() and similar functions". The documentation for collection URIs (http://www.saxonica.com/documentation/#!sourcedocs/collections) explicitly says that the default for the validation parameter depends on Configuration settings, which is not currently the case. So I'm inclined to be bold here and make the change: incompatible changes can be justified if they bring the code into line with the documentation.

#3 Updated by Michael Kay over 6 years ago

  • Status changed from New to Resolved

I have patched the StandardCollectionURIResolver on the 9.6 and 9.7 branches so it takes its default parsing options from the configuration settings. More precisely, it uses the default options from the PipelineConfiguration, which default to the options in the Configuration, but which may be varied on a per-query or per-transformation basis, though it is unusual to change anything other than the URIResolver.

(The Controller has methods getValidationMode() and setValidationMode(), and the validation mode that is set doesn't find its way into the PipelineConfiguration, and therefore won't affect the collection() function. I'll leave that problem for another day.)

#4 Updated by O'Neil Delpratt over 6 years ago

  • % Done changed from 0 to 100
  • Fixed in version set to 9.6.0.4

Bug fix applied in the Saxon 9.6.0.4 maintenance release.

#5 Updated by O'Neil Delpratt over 6 years ago

  • Status changed from Resolved to Closed

#6 Updated by O'Neil Delpratt over 5 years ago

  • Sprint/Milestone set to 9.6.0.4
  • Applies to branch 9.6 added
  • Fix Committed on Branch 9.6 added
  • Fixed in Maintenance Release 9.6.0.4 added

#7 Updated by O'Neil Delpratt over 5 years ago

  • Sprint/Milestone changed from 9.6.0.4 to 9.6.0.3
  • Fixed in Maintenance Release 9.6.0.3 added
  • Fixed in Maintenance Release deleted (9.6.0.4)

#8 Updated by O'Neil Delpratt over 5 years ago

  • Sprint/Milestone changed from 9.6.0.3 to 9.6.0.4
  • Fixed in Maintenance Release 9.6.0.4 added
  • Fixed in Maintenance Release deleted (9.6.0.3)

Please register to edit this issue

Also available in: Atom PDF