Bug #4127
closedparse-xml-fragment fails when -catalog is in use
100%
Updated by Michael Kay almost 6 years ago
Saxon's logic in parse-xml-fragment() is:
XMLReader reader = configuration.getSourceParser();
if(reader.getEntityResolver() != null) {
reader = configuration.createXMLParser();
}
((SAXSource)source).setXMLReader(reader);
reader.setEntityResolver((publicId, systemId) -> {
if ("http://www.saxonica.com/parse-xml-fragment/actual.xml".equals(systemId)) {
InputSource is1 = new InputSource(fragmentReader);
is1.setSystemId(baseURI);
return is1;
} else {
return null;
}
});
So it's making some allowance for the fact that the XMLReader might already have an EntityResolver. But it then sets its own EntityResolver and relies on this taking effect. The problem is that the call to Configuration.getSourceParser() gets the Apache ResolvingXMLReader, which ignores any attempt to set an EntityResolver.
Saxon here has no way of knowing that the particular XMLReader in use is going to ignore our attempt to set an EntityResolver. We don't really want to instantiate an explicit XMLReader implementation, but all other options seem to leave open the possibility that we'll get an XMLReader that silently ignores our EntityResolver.
Note that although we are testing this from the command line where we know that catalogs are in use because of the -catalog option, in the general case the user can configure the system to use the ResolvingXMLReader in a variety of ways, which we have no control over (for example, they can set it as the default SAX parser using a Java system property). They might also not be using this particular implementation of the catalog resolver, so a test for the particular class name is not going to work in all cases.
We could have a configuration option where the user explicitly tells us what XMLReader to instantiate when doing parse-xml-fragment(), but that's a bit of a desperate solution.
I can't see any ready alternative to the use of an EntityResolver to achieve the required effect. XML parsers don't generally have an option to read a "fragment" (external parsed entity) except by parsing a document entity that references the fragment.
Updated by Michael Kay almost 6 years ago
- Category set to Internals
- Priority changed from Low to Normal
Solved this (for at least one case) by changing parse-xml-fragment() as follows:
(1) initially it follows the current logic: gets an instance of the source parser class in the configuration, sets an entity resolver on it, parses the document.
(2) if either (a) there's already an entityResolver on the parser, or (b) parsing fails, then it tries again, this time with a clean parser obtained using SAXParserFactoryImpl.newInstance().newSAXParser().getXMLReader().
Now there's a possibility that by setting system properties etc, (2) will fail just as (1) did. But in the typical scenario it's now working.
Test case ParserTest.testParseXmlFragment() added.
Updated by Michael Kay almost 6 years ago
- Status changed from New to Resolved
- Applies to branch 9.9, trunk added
- Fix Committed on Branch 9.9, trunk added
Updated by Michael Kay over 5 years ago
Further fix committed: parsing failures in parse-xml-fragment() were being suppressed.
Updated by O'Neil Delpratt over 5 years ago
- Status changed from Resolved to Closed
- % Done changed from 0 to 100
- Fixed in Maintenance Release 9.9.1.2 added
Bug issue fixed in the Saxon 9.9.1.2 maintenance release.
Please register to edit this issue