Project

Profile

Help

Bug #6152

closed

XML Reader seems to be reused instead of re-creating

Added by Radu Coravu 9 months ago. Updated 6 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
External resources
Sprint/Milestone:
-
Start date:
2023-08-03
Due date:
% Done:

0%

Estimated time:
Legacy ID:
Applies to branch:
Fix Committed on Branch:
Fixed in Maintenance Release:
Platforms:
Java

Description

Using Saxon 12.3 I'm using the document() function in my XSLT like:

<xsl:message><xsl:value-of select="document('doctypeDitaTopic5.dita')"/></xsl:message>

In the Java code I set :

configuration.setParseOptions(configuration.getParseOptions().withEntityResolver(..)).withXMLReaderMaker(new Maker<XMLReader>() {...

but somehow it seems to me that when using doc-available or document in the XSLT my XMLReaderMaker is no longer called. The Saxon code seems instead to reuse an XML reader which was used when parsing the XSLT and this did not happen in older Saxon versions like version 11 which properly used my configured XML Reader Maker.

Actions #1

Updated by Radu Coravu 9 months ago

Probably this "net.sf.saxon.Configuration.reuseStyleParser(XMLReader)" is the cause for my problems and it is not something I can control by setting an option.

Actions #2

Updated by Radu Coravu 9 months ago

Also here:

net.sf.saxon.lib.DirectResourceResolver.resolve(ResourceRequest) the style parser is used:

ss = new SAXSource(config.getStyleParser(), is);

although the resolver may be called from a document() or doc-available function.

Actions #3

Updated by Radu Coravu 9 months ago

Also in this place: net.sf.saxon.Configuration.loadParser() Maybe it should ask the XML reader maker to create the XML reader if available.

Actions #4

Updated by Radu Coravu 9 months ago

For now I made two patches on our side, one in the DirectResourceResolver, to avoid using the style parser for xml nature calls :)

ss = new SAXSource(! ResourceRequest.XML_NATURE.equals(request.nature) ? config.getStyleParser() : null, is);

and two patches in the Configuration class to use the XML reader maker on the net.sf.saxon.Configuration.getSourceParser() call and to disable completely the sourceParserPool.

In general I think that maybe once the user sets an xml reader maker maybe the sourceParserPool could be disabled and the xml reader maker could be used instead.

Actions #5

Updated by Michael Kay 9 months ago

The Javadoc could be much clearer about the exact circumstances in which the parse options set using setParseOptions() are used. Unfortunately it would be a very lengthy description! There are 33 direct references to the field defaultParseOptions, and 54 places that call getParseOptions(), so it's hard to know where to begin.

It does feel very wrong that DirectResourceResolver is invoking the StyleParser, this should only be used for stylesheets and schema documents.

Actions #6

Updated by Radu Coravu 9 months ago

Thanks for the update Michael. I do not know precisely the reason of the sourceParserPool, is it speed? Does creating xml readers instead of reusing them cause that much delay that it's worth adding a parser pool? Could there maybe be an option to control if the parsers are reused or not. In my opinion if the xml reader marker is set, it should be used more often instead of using the parser pool.

Actions #7

Updated by Michael Kay 6 months ago

Sorry to have abandoned this in mid conversation, I'm just coming back to it now.

In answer to your last question, yes, we did have concrete evidence that the cost of parsing small documents was entirely dominated by parser initialisation cost and that for workloads where many small documents are parsed, pooling the parser has substantial benefits. Its possible this may no longer be true, but I don't think Xerces has changed that much, so I suspect that pooling is still worthwhile.

Actions #8

Updated by Michael Kay 6 months ago

The code in DirectResourceResolver where it invokes the StyleParser seems all wrong.

Firstly, we haven't checked at this point whether the request uses TEXT_NATURE. (Which it will for an unparsed-text() request with a specified encoding). As far as I can see, if we get an unparsed-text request with an encoding parameter then unless the CatalogResolver can handle it, we're going to get a failure at this point.

Secondly, we should only be using the StyleParser if it's a stylesheet or schema. In other cases we should be using the SourceParser.

Actions #9

Updated by Michael Kay 6 months ago

  • Tracker changed from Support to Bug
  • Category set to External resources
  • Assignee set to Michael Kay
  • Priority changed from Low to Normal
  • Platforms Java added
Actions #10

Updated by Radu Coravu 6 months ago

Thanks for looking further into improving this behavior Michael.

Actions #11

Updated by Michael Kay 6 months ago

  • Status changed from New to Resolved

I think I have now fixed this.

I've traced a couple of paths through. For parsing stylesheet modules, the parseOptions that are set externally have no effect, because Saxon wants to be fully in control of the way in which stylesheets are parsed. For calls on doc() or document(), the options set using configuration.setParseOptions(), including for example the XML_READER_MAKER, seem to be taking effect as they should.

Actions #12

Updated by Radu Coravu 6 months ago

Sounds good, thank you!

Please register to edit this issue

Also available in: Atom PDF