Bug #6214
closedCannot use InputXmlResolver to implement custom scheme on Saxon-HE 10N
0%
Description
We finally made the jump from Saxon-HE 9.9.1.6N to Saxon-HE 10.9N and only noticed after a few weeks that we had a gap in our test suite. Unfortunately, this breaks some of our use cases and forces us to revert back to Saxon-HE 9.x again.
We use the XsltTransformer.InputXmlResolver
property to pass our own custom XmlResolver
class that allows internal XSLTs to get data at runtime that is not available as actual file through the document()
function.
Using an XSLT like this:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="3.0">
<xsl:template match="/">
<xsl:sequence select="document('custom-scheme://test')"/>
</xsl:template>
</xsl:stylesheet>
With a custom XmlResolver like this:
private sealed class CustomResolver : XmlResolver
{
private readonly XmlResolver _innerResolver;
public CustomResolver(XmlResolver innerResolver) => _innerResolver = innerResolver;
public override ICredentials Credentials { set { _innerResolver.Credentials = value; } }
public override Uri ResolveUri(Uri baseUri, string relativeUri) => base.ResolveUri(baseUri, relativeUri);
public override object GetEntity(Uri absoluteUri, string role, Type ofObjectToReturn)
{
if (absoluteUri.Scheme == "custom-scheme")
{
// custom handling here.
}
return _innerResolver.GetEntity(absoluteUri, role, ofObjectToReturn);
}
}
And testing code similar to this:
var processor = new Processor();
// Load any source document (content does not matter for this test)
var input = processor.NewDocumentBuilder().Build(GetInputURI("simple.xml"));
var transformer = processor.NewXsltCompiler().Compile(GetStyleURI("custom_protocol.xsl")).Load();
transformer.InitialContextNode = input;
transformer.InputXmlResolver = new CustomResolver(transformer.InputXmlResolver);
var output = new XDocument();
using var outputWriter = output.CreateWriter();
var destination = new TextWriterDestination(outputWriter) { CloseAfterUse = true };
transformer.Run(destination);
We get a successful run on Saxon-HE 9.x but see an exception on Saxon-HE 10:
Error at char 9 in expression in xsl:sequence/@select on line 3 column 10 of custom_protocol.xsl:
FODC0002 I/O error reported by XML parser processing
custom-scheme://test: unknown protocol: custom-scheme.
Caused by java.net.MalformedURLException: unknown protocol: custom-scheme
I don't expect this to have had major changes between 9 and 10, but other sources on Google suggest that this might be a location that uses the URL
class when it should've used the URI
class instead.
I wouldn't be surprised either if we simply had to set some other options to get this to work again, like on the transformer or even the processor (that were either not necessary with Saxon 9 or set by default.)
Updated by Emanuel Wlaschitz 7 months ago
To add more detail; GetEntity
often looks like this (mainly because I realized that the shortened example on top leaves the impression that the custom handling does not short-circuit and always calls the original resolver):
public override object GetEntity(Uri absoluteUri, string role, Type ofObjectToReturn)
{
if (absoluteUri.Scheme == "custom-scheme")
{
var memoryStream = new MemoryStream();
var writerSettings = new XmlWriterSettings { OmitXmlDeclaration = true };
using (var xmlWriter = XmlWriter.Create(memoryStream, writerSettings))
{
new XDocument(new XElement("root", new XElement("data", "custom data here"))).Save(xmlWriter);
}
memoryStream.Position = 0;
return memoryStream;
}
return _innerResolver.GetEntity(absoluteUri, role, ofObjectToReturn);
}
...except that many of them are more elaborate in what kind of XML data they return and where it comes from.
Updated by Emanuel Wlaschitz 7 months ago
Also, I typo'd the title. Should be InputXmlResolver, not InputXmlHandler.
Updated by Martin Honnen 7 months ago
Interesting, based on your snippets I tried to reproduce the problem in https://github.com/martin-honnen/Saxon10CustomResolverTest1 but there, in a .NET 4.8 console app with Saxon 10.9 HE the resolver seems to work fine, I get the output <?xml version="1.0" encoding="UTF-8"?><root><data>custom data here</data></root>
.
I will try to reproduce with a different destination.
Updated by Emanuel Wlaschitz 7 months ago
Curious. For most of those, we perform an in-memory transformation (from an XDocument
back into another XDocument
) - which is one of the reasons why we rely on the XmlResolver
to do our bidding.
Updated by Martin Honnen 7 months ago
https://github.com/martin-honnen/Saxon10CustomResolverTest1/tree/XDocumentXmlWriterDestination also runs fine for me here, outputting e.g.
<?xml version="1.0" encoding="ibm850"?>
<root>
<data>custom data here</data>
</root>
Updated by Emanuel Wlaschitz 7 months ago
Hm, an effective return null;
in GetEntity
seems to cause this. That worries me a little, since we only return null
when the internal processing doesn't find anything, or the XSLT messes up the URI.
That fortunately means that we can work around it (by returning empty data instead); but I'm somewhat worried that this might break doc-available()
and friends.
Do you happen to know if that would be the case if we simply return an empty MemoryStream
rather than null
?
Also, it means that it silently went from "works anyways" on Saxon 9 to "blows up" on Saxon 10; which might be worth investigating.
Updated by Martin Honnen 7 months ago
I think you need to wait for the Saxonica guys to pick this up; I don't know who is currently working with the "legacy" .NET framework Saxon .NET, let's see whether O'Neil or Michael or Norm picks this up.
Updated by Norm Tovey-Walsh 7 months ago
That'd be me in this case, Martin :-)
I've confirmed that this doesn't occur in the Java version, so that narrows down the likely culprits. I think I can see where the problem might be, but I'll need to reproduce it on my Windows machine to confirm that the fix works. Or fail to reproduce it, perhaps, given the most recent comments.
Updated by Emanuel Wlaschitz 7 months ago
Hey Norm!
The easiest way to reproduce is to take Martin's code and replace GetEntity
with
public override object GetEntity(Uri absoluteUri, string role, Type ofObjectToReturn)
{
return null;
}
Certainly not the best implementation, but one that causes the issue to show up.
Updated by Norm Tovey-Walsh 7 months ago
Yes, my experiments had the same results as Martin's tests. (Thank you, Martin!)
I did instrument all of our code that I thought might be causing the problem and confirmed that when you install a custom handler, none of that code gets called before your custom handler.
The semantics of the GetEntity
method in .NET differ from the semantics of the URI and entity resolvers in Java. In Java, a resolver that fails to find a resource is expected to return null, this tells the parser (or other processor) to attempt something else, perhaps direct retrieval of the resource. On .NET, the GetEntity
method is expected to return a stream. It should never return null
. If looking up the resource doesn't succeed, it has to do the "something else" for the parser or processor. This surprised me when I first encountered it in the C# version of XML Resolver (used in Saxon 11+).
There are a lot of moving parts here. I think what happens is that a null
returned from GetEntity
gets interpreted with the Java semantics so the underlying parser falls back and attempts to get the resource with the custom scheme and that falls over.
I can't explain how this was different in Saxon 9, but there are often significant changes between major versions so I'm not sure that whatever 9 is doing would be relevant anyway.
In short: I think the GetEntity
method should return a stream or throw an exception. It should never return null.
Updated by Emanuel Wlaschitz 7 months ago
The problem with throwing an Exception is that it does the same thing as returning null
; it causes the other handling to happen and triggers the protocol exception.
And the problem with returning an empty MemoryStream
is that it causes "Premature end of file".
So...my only way out here would be to return a valid but otherwise empty XML fragment - and that comes with the drawback of making doc-available()
return true()
.
This is ok for most our internal XSLTs, but in some cases we do rely on this to signal "nope, no document here" to the XSLT when certain condition are (not) met. We could work around this in the XSLT for the most part, but some of it might not be fully under our control.
For completeness sake, I added this xsl:message
into Martin's XSLT before the xsl:sequence
:
<xsl:message expand-text="yes">doc-available: {doc-available('custom-scheme://test')}</xsl:message>
Can we hook into the pipeline elsewhere to make this work like in Saxon 9? Or return something specific from GetEntity
to prevent this?
Updated by Emanuel Wlaschitz 7 months ago
We just ran a few tests against Saxon 9; turns out it was "only" a warning there:
Warning at char 9 in xsl:sequence/@select on line 4 column 64 of custom_protocol.xsl:
FODC0002: I/O error reported by XML parser processing custom-scheme://test: unknown
protocol: custom-scheme
Same for throwing an exception (NotImplementedException
for simplicity):
Warning at char 9 in xsl:sequence/@select on line 4 column 64 of custom_protocol.xsl:
FODC0002: Exception thrown by URIResolver: The method or operation is not implemented.
So I guess the more strict handling of warnings/errors causes this to fail the transformation. It simply more-so happened to work on Saxon 9; more likely by chance rather than intentionally.
Which leaves me with the same question as before: How can we signal "this document is not available" through the InputXmlResolver
?
Updated by Norm Tovey-Walsh 7 months ago
If you reach the point in your stylesheet where you're attempting to resolve a document()
function, it's too late to signal that the document is unavailable. However, you can test if the document is available before calling the document function:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="3.0">
<xsl:template match="/">
<xsl:choose>
<xsl:when test="doc-available('custom-scheme://test')">
<xsl:sequence select="document('custom-scheme://test')"/>
</xsl:when>
<xsl:otherwise>
<doc>unavailable</doc>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
When I run that stylesheet in my test harness where the custom resolver returns null, it outputs:
<?xml version="1.0" encoding="UTF-8"?><doc>unavailable</doc>
Is that a sufficient workaround for your use case?
Updated by Emanuel Wlaschitz 7 months ago
Thats perfectly acceptable, thanks!
Now we only have to go back and figure out why we have locations that still fail this. But I'm guessing we're just lacking the doc-available()
call there and expect the previous "empty node-set" behavior.
I think we can lay this report to rest then, since it's yet another expected (yet unexpected) behavior change and simply means we have to change our expectations.
Updated by Norm Tovey-Walsh 7 months ago
We try very hard to maintain backwards compatibility, but there's always a tension between "making improvements" and "not changing things". At major version boundaries, we allow ourselves more freedom to make improvements. (I maintain a few open source projects that use Saxon and I know exactly how painful those changes can be.)
I'm inclined to think that if returning null in Saxon 9 "worked" that it was a bug that it did so.
I'll leave this issue open for a bit, please do let us know if you have problems with the workaround.
Updated by Emanuel Wlaschitz 7 months ago
I'm inclined to think that if returning null in Saxon 9 "worked" that it was a bug that it did so.
I tend to agree at this point; especially after seeing a bunch of other reports where the result was "this is intended by the XSLT 3.0 spec" (or similar.)
We just have to deal with it for now.
Our current plan is to throw an exception that clearly states how the requested URI is unavailable, and suggest testing for it using doc-available()
first. That way, we can leave some sort of documentation for end-users in case we do fail to update all necessary locations.
In the end, this did reveal an actual bug in one of our XSLTs that didn't build the URI correct but silently appeared to be working on Saxon 9.
I'll report back once we checked (and updated) all our uses to leave a solution for others that might run into the same problem. Thanks again for getting us on track!
Updated by Michael Kay 7 months ago
- Subject changed from Cannot use InputXmlHandler to implement custom scheme on Saxon-HE 10N to Cannot use InputXmlResolver to implement custom scheme on Saxon-HE 10N
- Assignee set to Michael Kay
Updated by Michael Kay 7 months ago
- Status changed from New to AwaitingInfo
- Assignee changed from Michael Kay to Norm Tovey-Walsh
Updated by Emanuel Wlaschitz 6 months ago
Alright, we reviewed (and updated) our usages and it seems to work as expected now.
For anyone else running into this; the main change is testing for doc-available($customUrl)
before attempting to load it with document($customUrl)
.
We also made sure to change our XmlResolver.GetEntity
implementation to throw a more descriptive exception (just in case it shows up in any log files or similar) but returning null
is also fine (as long as the doc-available()
test is present in the XSLT.)
Thanks again for the support!
Updated by Norm Tovey-Walsh 6 months ago
- Status changed from AwaitingInfo to Closed
Glad you got it working. I'm going to close this as resolved, but feel free to open another issue if you run into more problems.
Please register to edit this issue