Project

Profile

Help

Bug #4729

TransformerFactory doesn't accept ACCESS_EXTERNAL_STYLESHEET property

Added by Matthias H 9 months ago. Updated 8 months ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
JAXP Java API
Sprint/Milestone:
-
Start date:
2020-09-11
Due date:
% Done:

100%

Estimated time:
Legacy ID:
Applies to branch:
10, trunk
Fix Committed on Branch:
10, trunk
Fixed in Maintenance Release:

Description

In the codebase I have exisiting code like this (boiled down to the essence)

TransformerFactory transformerFactory = TransformerFactory.newInstance();
transformerFactory.setFeature(XMLConstants.FEATURE_SECURE_PROCESSING, true);
transformerFactory.setAttribute(XMLConstants.ACCESS_EXTERNAL_DTD, "");
transformerFactory.setAttribute(XMLConstants.ACCESS_EXTERNAL_STYLESHEET, "");
Transformer transformer = transformerFactory.newTransformer();

//set transformer properties...

StringReader stringReader = new StringReader(someXml);
transformer.transform(new StreamSource(stringReader), someOutput);

When I include Saxon as a library to use it at some other place in my code, the above code breaks. It raises a java.lang.IllegalArgumentException: Unrecognized configuration feature: http://javax.xml.XMLConstants/property/accessExternalDTD

I saw there was a discussion in https://saxonica.plan.io/issues/4234 about this but it seem to be not solved. I tried with Saxon-HE 10.2 and 9.9.1-7. Both have the exception.

If I remove ACCESS_EXTERNAL_DTD/ACCESS_EXTERNAL_STYLESHEET it works. But then the code really tries to access an external dtd!

I think you should support the above properties as you wrote in https://saxonica.plan.io/issues/4234: "However, when input is provided in the form of a StreamSource, and we instantiate the XMLReader ourselves from within Saxon, then we should arguably take account of these properties supplied to the TransformerFactory by setting corresponding properties on the XMLReader that we instantiate. I will look at making that change."

I think this is especially important because Saxon is "registered" to be the factory returned by "TransformerFactory.newInstance()" as soon as it is on the classpath. Of course in my own code I can change this and might use something like TransformerFactory.newDefaultInstance but imagine if a third party library makes use of a transformer via newInstance... Maybe it could also be an option to provide a saxon dependency that includes the Saxon Transformers but does not "register" itself as standard transformer. I don't know if this is easy to achieve as I do not know the details of this mechanism. But if would be possible it would be easier to just use the Saxon Transformer where I want it to be used but rely on the default transformers at other places.

History

#1 Updated by Michael Kay 9 months ago

Two questions:

(a) Are you using the XML parser built in to the JDK, or some other parser (such as Apache Xerces)

(b) If the JDK parser, which JDK version?

This is all about persuading Saxon to configure the XML parser on your behalf, which is tricky because different parsers don't all behave in the same way.

#2 Updated by Michael Kay 9 months ago

The other point I would make here is that we will never guarantee 100% compatibility of Saxon's JAXP TransformerFactory implementation with the one in the JDK. There are a number of reasons:

(a) the JAXP interfaces are under-specified

(b) the JAXP interface specifications have changed over the years (not the formal interfaces, but details such as requiring particular configuration properties to be recognized) and we have no input to these changes

(c) there's no interoperability test suite for JAXP

(d) the JDK implementation is XSLT 1.0 with vendor extensions that are permitted but not required for XSLT conformance

Therefore it's unwise to put an application into production that relies solely on JAXP interfaces, and expect it to work whatever TransformerFactory it finds lying around on the classpath. You need test your application against every JAXP TransformerFactory that you intend to support.

#3 Updated by Matthias H 9 months ago

I use the built in parsers of JDK11. Without the Saxon library TransformerFactory.newInstance returns com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl (which is the same as newDefaultInstance).

If Saxon can not guarantee the compatibility wouldn't it be even more important to have an easy way to opt out from Saxon being used everywhere newInstance is used?

If you could provide 2 Saxon dependencies a user could choose:

  • I want Saxon and use it everywhere (via newInstance) or
  • I want Saxon but I define explicitely where I want to use it (for example via BasicTransformerFactory)

#4 Updated by Michael Kay 9 months ago

If Saxon can not guarantee the compatibility wouldn't it be even more important to have an easy way to opt out from Saxon being used everywhere newInstance is used?

It would be great if we could do that but we can't: the JAXP instantiation mechanism isn't under our control. The only way we can opt out is by not including the relevant entry in the JAR manifest, which would stop the mechanism working for everybody. That's what we chose to do with XPath - we no longer register as an XPath service provider, and you only get Saxon if you explicitly request it. But we decided it would be too disruptive to do the same on the XSLT side.

#5 Updated by Michael Kay 9 months ago

Bug #4234 was concerned with the JAXP validation interface, not with the transformation interface. The fix for that bug caused the three properties

XMLConstants.FEATURE_SECURE_PROCESSING
XMLConstants.ACCESS_EXTERNAL_DTD
XMLConstants.ACCESS_EXTERNAL_SCHEMA

to be recognised on the validation API, but it made no changes to the transformation API, and did not cause XMLConstants.ACCESS_EXTERNAL_STYLESHEET to be recognised.

I propose to fix this so that the transformation interface conforms with JAXP 1.5. Note however that this will not fix the underlying problem, which is that a transformation that requires Xalan should explicitly load Xalan, rather than using the JAXP loading mechanism indiscriminantly. There may well be other aspects of your transformation that make it dependent on Xalan (or on XSLT 1.0).

#6 Updated by Michael Kay 9 months ago

Also note, the fix for #4234 didn't actually result in the ACCESS_EXTERNAL_SCHEMA property being recognized on the schema validation interface, because of difficulties interpreting exactly what the JAXP specification was supposed to mean.

#7 Updated by Michael Kay 9 months ago

Discussed at team meeting. We decided that it might make sense to apply these properties before calling the relevant resolver, rather than leaving the resolver to make the decision. This would generalise more easily to different kinds of resource and resolver, especially resources not relevant to XSLT 1.0 processors, such as unparsed text and JSON resources.

#8 Updated by Michael Kay 9 months ago

I'm now implementing this.

The JAXP definition is hopelessly underspecified, even allowing for the fact that it assumes XSLT 1.0. For example, it doesn't say what happens if you call setAttribute() with one of these properties more than once: are they supposed to be cumulative? I'm assuming the last one wins. I'll attempt to respect the intent, rather than the detail.

ACCESS_EXTERNAL_DTD will be passed straight to the XML parser - which means it's ignored if the user supplies the parser, and is only effective if Saxon instantiates the parser itself.

ACCESS_EXTERNAL_STYLESHEET will map to the Saxon Configuration property Feature.ALLOWED_PROTOCOLS which will be changed to affect all resources fetched directly by Saxon (schemas, source documents, stylesheet modules, queries, JSON documents etc etc), and to kick in before calling any URI resolver. On the Validation APIs, ACCESS_EXTERNAL_SCHEMA will set the same Saxon Configuration property.

#9 Updated by Michael Kay 9 months ago

I decided to roll back on this, and go for a minimum change that fixes the bug.

In Saxon 10 we introduced a configuration property Feature.ALLOWED_PROTOCOLS which has the format of the JAXP constants such as ACCESS_EXTERNAL_STYLESHEET. If ACCESS_EXTERNAL_STYLESHEET is supplied on the TransformerFactory, I shall simply set ACCESS_EXTERNAL_STYLESHEET on the Configuration. This property is currently used by the "standard" resolvers (such as the standard URI resolvers, and can be overridden or ignored if a custom resolver is used.

The ACCESS_EXTERNAL_DTD property will be passed through to the XML parser in cases where Saxon instantiates the XML parser. It won't affect any user-supplied parser (e.g an XMLReader in a SAXSource).

#10 Updated by Michael Kay 9 months ago

  • Subject changed from Including Saxon breaks existing code (IllegalArgumentException) to TransformerFactory doesn't accept ACCESS_EXTERNAL_STYLESHEET property
  • Category set to JAXP Java API
  • Status changed from New to Resolved
  • Assignee set to Michael Kay
  • Applies to branch 10, trunk added
  • Fix Committed on Branch 10, trunk added

Saxon's TransformerFactory now accepts the ACCESS_EXTERNAL_DTD and ACCESS_EXTERNAL_STYLESHEET properties on the getAttribute() and setAttribute() methods. The first is passed to any Saxon-instantiated SAX parser; the second is implemented by setting the Configuration property Feature.ALLOWED_PROTOCOLS.

A new set of JUnit tests has been written as jaxptests/AllowedProtocolsTest.java

#11 Updated by O'Neil Delpratt 8 months ago

Bug fix applied in the Saxon 10.3 maintenance release

#12 Updated by O'Neil Delpratt 8 months ago

  • Status changed from Resolved to Closed
  • % Done changed from 0 to 100
  • Fixed in Maintenance Release 10.3 added

Please register to edit this issue

Also available in: Atom PDF