Bug #6152
closedXML Reader seems to be reused instead of re-creating
100%
Description
Using Saxon 12.3 I'm using the document() function in my XSLT like:
<xsl:message><xsl:value-of select="document('doctypeDitaTopic5.dita')"/></xsl:message>
In the Java code I set :
configuration.setParseOptions(configuration.getParseOptions().withEntityResolver(..)).withXMLReaderMaker(new Maker<XMLReader>() {...
but somehow it seems to me that when using doc-available or document in the XSLT my XMLReaderMaker is no longer called. The Saxon code seems instead to reuse an XML reader which was used when parsing the XSLT and this did not happen in older Saxon versions like version 11 which properly used my configured XML Reader Maker.
Updated by Radu Coravu over 1 year ago
Probably this "net.sf.saxon.Configuration.reuseStyleParser(XMLReader)" is the cause for my problems and it is not something I can control by setting an option.
Updated by Radu Coravu over 1 year ago
Also here:
net.sf.saxon.lib.DirectResourceResolver.resolve(ResourceRequest) the style parser is used:
ss = new SAXSource(config.getStyleParser(), is);
although the resolver may be called from a document() or doc-available function.
Updated by Radu Coravu over 1 year ago
Also in this place: net.sf.saxon.Configuration.loadParser() Maybe it should ask the XML reader maker to create the XML reader if available.
Updated by Radu Coravu over 1 year ago
For now I made two patches on our side, one in the DirectResourceResolver, to avoid using the style parser for xml nature calls :)
ss = new SAXSource(! ResourceRequest.XML_NATURE.equals(request.nature) ? config.getStyleParser() : null, is);
and two patches in the Configuration class to use the XML reader maker on the net.sf.saxon.Configuration.getSourceParser() call and to disable completely the sourceParserPool.
In general I think that maybe once the user sets an xml reader maker maybe the sourceParserPool could be disabled and the xml reader maker could be used instead.
Updated by Michael Kay over 1 year ago
The Javadoc could be much clearer about the exact circumstances in which the parse options set using setParseOptions() are used. Unfortunately it would be a very lengthy description! There are 33 direct references to the field defaultParseOptions, and 54 places that call getParseOptions(), so it's hard to know where to begin.
It does feel very wrong that DirectResourceResolver is invoking the StyleParser, this should only be used for stylesheets and schema documents.
Updated by Radu Coravu over 1 year ago
Thanks for the update Michael. I do not know precisely the reason of the sourceParserPool, is it speed? Does creating xml readers instead of reusing them cause that much delay that it's worth adding a parser pool? Could there maybe be an option to control if the parsers are reused or not. In my opinion if the xml reader marker is set, it should be used more often instead of using the parser pool.
Updated by Michael Kay about 1 year ago
Sorry to have abandoned this in mid conversation, I'm just coming back to it now.
In answer to your last question, yes, we did have concrete evidence that the cost of parsing small documents was entirely dominated by parser initialisation cost and that for workloads where many small documents are parsed, pooling the parser has substantial benefits. Its possible this may no longer be true, but I don't think Xerces has changed that much, so I suspect that pooling is still worthwhile.
Updated by Michael Kay about 1 year ago
The code in DirectResourceResolver where it invokes the StyleParser seems all wrong.
Firstly, we haven't checked at this point whether the request uses TEXT_NATURE. (Which it will for an unparsed-text() request with a specified encoding). As far as I can see, if we get an unparsed-text request with an encoding parameter then unless the CatalogResolver can handle it, we're going to get a failure at this point.
Secondly, we should only be using the StyleParser if it's a stylesheet or schema. In other cases we should be using the SourceParser.
Updated by Michael Kay about 1 year ago
- Tracker changed from Support to Bug
- Category set to External resources
- Assignee set to Michael Kay
- Priority changed from Low to Normal
- Platforms Java added
Updated by Radu Coravu about 1 year ago
Thanks for looking further into improving this behavior Michael.
Updated by Michael Kay about 1 year ago
- Status changed from New to Resolved
I think I have now fixed this.
I've traced a couple of paths through. For parsing stylesheet modules, the parseOptions that are set externally have no effect, because Saxon wants to be fully in control of the way in which stylesheets are parsed. For calls on doc() or document(), the options set using configuration.setParseOptions(), including for example the XML_READER_MAKER, seem to be taking effect as they should.
Updated by Debbie Lockett 6 months ago
- Status changed from Resolved to Closed
- % Done changed from 0 to 100
- Applies to branch 12, trunk added
- Fix Committed on Branch 12, trunk added
- Fixed in Maintenance Release 12.4 added
Belatedly marking as closed - the bug fix was applied in the Saxon 12.4 Maintenance release.
Please register to edit this issue