Project

Profile

Help

URL Resolution Failure for Stream Source Using document() Function

Added by Eliot Kimber about 12 years ago

This is using Saxon 9.7.0.1

I am applying transforms to objects served from our CMS system (RSuite). Within the transform I use the document() function to parse additional documents. The URL used is specific our CMS, e.g.: document('rsuite:/res/content/12345');

I configure Saxon with an RSuite-specific URI resolver that resolves the rsuite:/ URL into a Source. Our code had been providing a DOMSource, which has always worked. The systemID for the source is this same RSuite-specific URL.

However, I changed our code to provide a StreamSource instead (because Saxon may also get XSLT components from the repository, in which case it chokes on a DOMSource but likes a StreamSource, which makes sense). The stream source has a non-null input stream and reader.

I've debugged through the Document class to the point where it calls the underlying parser on the source. The parser fails with:

ERROR: java.net.MalformedURLException: unknown protocol: rsuite

It would appear that the the parser is not using the RSuite-specific URI resolver that I've configured on the transformer compiler and that is used by Saxon to resolve stylesheet components and to resolve the initial URI reference from the document() function.

I was surprised that the parser was trying to resolve the URL at all (or at least validating it), since it has a good input stream.

Is there a configuration aspect I'm missing when setting up my transformer that will allow the parser used by document() to handled this RSuite-specific stream source? I poked around in the parser object itself in the debugger and didn't see anything that looked like a URI resolver object.

Thanks,

Eliot


Replies (4)

Please register to reply

RE: URL Resolution Failure for Stream Source Using document() Function - Added by Michael Kay about 12 years ago

Does the StreamSource that you supply have a SystemID set, and if so what is it? Does it use the proprietary "rsuite" protocol?

If so, does the XML document you are parsing perhaps contain any relative URI references to external entities (perhaps a DTD?)

RE: URL Resolution Failure for Stream Source Using document() Function - Added by Eliot Kimber about 12 years ago

Yes, the StreamSource sets the system ID to the "rsuite:/res/1235" URL.

The document does have an external DTD reference with a relative system ID (e.g., "article.dtd"), so that could explain the failure--I hadn't thought of that.

But in any case, given an RSuite-specific URI resolver, such references should work. But now that you mention it, the DTD reference would require a separate RSuite-specific resolver, so that may be the problem (we do silly things with DTDs that require a special URI resolver--don't ask).

Should the Saxon-provided parser be using the URI resolver configured on the transformer?

I have worked around the problem by providing a SAXSource, which works correctly. That may in fact be the best solution, but I wanted to make sure I understood exactly what's going on with respect to URI resolution when providing a StreamSource.

With the SAXSource solution I configure the SAXSource with an RSuite-provided XMLReader, which is of course correctly configured with the necessary URI resolvers. It may be that that is the intended way to address this type of issue.

But in my generic RSuite-specific URI resolver it would be cleaner if I could blindly return a StreamSource, as that's what I do for non-XML objects.

For example, stylesheets may use the same form of URL for import and include instructions. RSuite stores XSLT modules as "binary" objects and so provides Saxon with a StreamSource in that case. Saxon resolves those URIs fine using the URI resolver I've configured on the Transformer.

Thanks,

Eliot

RE: URL Resolution Failure for Stream Source Using document() Function - Added by Michael Kay about 12 years ago

The reference to the DTD is not resolved by the URIResolver, but by the EntityResolver registered with the XML parser. So if you are supplying a SAXSource containing an XMLReader containing an appropriate EntityResolver, that should work OK. In principle it might be nice, when you supply a URIResolver and Saxon creates the XMLReader, if Saxon were to initialize the XMLReader with an EntityResolver that delegates to the URIResolver. There are a few problems with that, however. Firstly, EntityResolvers understand public IDs and URIResolvers don't. Secondly, it's not what JAXP specifies. Thirdly, a change of this kind would probably be disruptive to someone.

RE: URL Resolution Failure for Stream Source Using document() Function - Added by Eliot Kimber about 12 years ago

Good point about the EntityResolver--I'll check that and see if I simply failed to configure the transformer with one.

However, I suspect that providing a pre-configured SAXSource is the best solution overall here. That works and seems to be most consistent with the JAXP architecture and Saxon's expectations.

Thanks,

Eliot

    (1-4/4)

    Please register to reply