Project

Profile

Help

Bug #6214

closed

Cannot use InputXmlResolver to implement custom scheme on Saxon-HE 10N

Added by Emanuel Wlaschitz 7 months ago. Updated 6 months ago.

Status:
Closed
Priority:
Low
Category:
-
Sprint/Milestone:
-
Start date:
2023-10-05
Due date:
% Done:

0%

Estimated time:
Legacy ID:
Applies to branch:
10
Fix Committed on Branch:
Fixed in Maintenance Release:
Platforms:
.NET

Description

We finally made the jump from Saxon-HE 9.9.1.6N to Saxon-HE 10.9N and only noticed after a few weeks that we had a gap in our test suite. Unfortunately, this breaks some of our use cases and forces us to revert back to Saxon-HE 9.x again.

We use the XsltTransformer.InputXmlResolver property to pass our own custom XmlResolver class that allows internal XSLTs to get data at runtime that is not available as actual file through the document() function.

Using an XSLT like this:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="3.0">
    <xsl:template match="/">
	<xsl:sequence select="document('custom-scheme://test')"/>
    </xsl:template>
</xsl:stylesheet>

With a custom XmlResolver like this:

private sealed class CustomResolver : XmlResolver
{
    private readonly XmlResolver _innerResolver;
    public CustomResolver(XmlResolver innerResolver) => _innerResolver = innerResolver;
    public override ICredentials Credentials { set { _innerResolver.Credentials = value; } }
    public override Uri ResolveUri(Uri baseUri, string relativeUri) => base.ResolveUri(baseUri, relativeUri);

    public override object GetEntity(Uri absoluteUri, string role, Type ofObjectToReturn)
    {
        if (absoluteUri.Scheme == "custom-scheme")
        {
            // custom handling here.
        }
        return _innerResolver.GetEntity(absoluteUri, role, ofObjectToReturn);
    }
}

And testing code similar to this:

var processor = new Processor();

// Load any source document (content does not matter for this test)
var input = processor.NewDocumentBuilder().Build(GetInputURI("simple.xml"));
var transformer = processor.NewXsltCompiler().Compile(GetStyleURI("custom_protocol.xsl")).Load();

transformer.InitialContextNode = input;
transformer.InputXmlResolver = new CustomResolver(transformer.InputXmlResolver);

var output = new XDocument();
using var outputWriter = output.CreateWriter();
var destination = new TextWriterDestination(outputWriter) { CloseAfterUse = true };
transformer.Run(destination);

We get a successful run on Saxon-HE 9.x but see an exception on Saxon-HE 10:

Error at char 9 in expression in xsl:sequence/@select on line 3 column 10 of custom_protocol.xsl:
  FODC0002  I/O error reported by XML parser processing
  custom-scheme://test: unknown protocol: custom-scheme.
  Caused by java.net.MalformedURLException: unknown protocol: custom-scheme

I don't expect this to have had major changes between 9 and 10, but other sources on Google suggest that this might be a location that uses the URL class when it should've used the URI class instead.

I wouldn't be surprised either if we simply had to set some other options to get this to work again, like on the transformer or even the processor (that were either not necessary with Saxon 9 or set by default.)

Actions #1

Updated by Emanuel Wlaschitz 7 months ago

To add more detail; GetEntity often looks like this (mainly because I realized that the shortened example on top leaves the impression that the custom handling does not short-circuit and always calls the original resolver):

public override object GetEntity(Uri absoluteUri, string role, Type ofObjectToReturn)
{
    if (absoluteUri.Scheme == "custom-scheme")
    {
        var memoryStream = new MemoryStream();
        var writerSettings = new XmlWriterSettings { OmitXmlDeclaration = true };

        using (var xmlWriter = XmlWriter.Create(memoryStream, writerSettings))
        {
            new XDocument(new XElement("root", new XElement("data", "custom data here"))).Save(xmlWriter);
        }

        memoryStream.Position = 0;
        return memoryStream;
    }
    return _innerResolver.GetEntity(absoluteUri, role, ofObjectToReturn);
}

...except that many of them are more elaborate in what kind of XML data they return and where it comes from.

Actions #2

Updated by Emanuel Wlaschitz 7 months ago

Also, I typo'd the title. Should be InputXmlResolver, not InputXmlHandler.

Actions #6

Updated by Martin Honnen 7 months ago

Interesting, based on your snippets I tried to reproduce the problem in https://github.com/martin-honnen/Saxon10CustomResolverTest1 but there, in a .NET 4.8 console app with Saxon 10.9 HE the resolver seems to work fine, I get the output <?xml version="1.0" encoding="UTF-8"?><root><data>custom data here</data></root>.

I will try to reproduce with a different destination.

Actions #7

Updated by Emanuel Wlaschitz 7 months ago

Curious. For most of those, we perform an in-memory transformation (from an XDocument back into another XDocument) - which is one of the reasons why we rely on the XmlResolver to do our bidding.

Actions #8

Updated by Martin Honnen 7 months ago

https://github.com/martin-honnen/Saxon10CustomResolverTest1/tree/XDocumentXmlWriterDestination also runs fine for me here, outputting e.g.

<?xml version="1.0" encoding="ibm850"?>
<root>
  <data>custom data here</data>
</root>
Actions #9

Updated by Emanuel Wlaschitz 7 months ago

Hm, an effective return null; in GetEntity seems to cause this. That worries me a little, since we only return null when the internal processing doesn't find anything, or the XSLT messes up the URI.

That fortunately means that we can work around it (by returning empty data instead); but I'm somewhat worried that this might break doc-available() and friends. Do you happen to know if that would be the case if we simply return an empty MemoryStream rather than null?

Also, it means that it silently went from "works anyways" on Saxon 9 to "blows up" on Saxon 10; which might be worth investigating.

Actions #10

Updated by Martin Honnen 7 months ago

I think you need to wait for the Saxonica guys to pick this up; I don't know who is currently working with the "legacy" .NET framework Saxon .NET, let's see whether O'Neil or Michael or Norm picks this up.

Actions #11

Updated by Norm Tovey-Walsh 7 months ago

That'd be me in this case, Martin :-)

I've confirmed that this doesn't occur in the Java version, so that narrows down the likely culprits. I think I can see where the problem might be, but I'll need to reproduce it on my Windows machine to confirm that the fix works. Or fail to reproduce it, perhaps, given the most recent comments.

Actions #12

Updated by Emanuel Wlaschitz 7 months ago

Hey Norm!

The easiest way to reproduce is to take Martin's code and replace GetEntity with

public override object GetEntity(Uri absoluteUri, string role, Type ofObjectToReturn)
{
    return null;
}

Certainly not the best implementation, but one that causes the issue to show up.

Actions #13

Updated by Norm Tovey-Walsh 7 months ago

Yes, my experiments had the same results as Martin's tests. (Thank you, Martin!)

I did instrument all of our code that I thought might be causing the problem and confirmed that when you install a custom handler, none of that code gets called before your custom handler.

The semantics of the GetEntity method in .NET differ from the semantics of the URI and entity resolvers in Java. In Java, a resolver that fails to find a resource is expected to return null, this tells the parser (or other processor) to attempt something else, perhaps direct retrieval of the resource. On .NET, the GetEntity method is expected to return a stream. It should never return null. If looking up the resource doesn't succeed, it has to do the "something else" for the parser or processor. This surprised me when I first encountered it in the C# version of XML Resolver (used in Saxon 11+).

There are a lot of moving parts here. I think what happens is that a null returned from GetEntity gets interpreted with the Java semantics so the underlying parser falls back and attempts to get the resource with the custom scheme and that falls over.

I can't explain how this was different in Saxon 9, but there are often significant changes between major versions so I'm not sure that whatever 9 is doing would be relevant anyway.

In short: I think the GetEntity method should return a stream or throw an exception. It should never return null.

Actions #14

Updated by Emanuel Wlaschitz 7 months ago

The problem with throwing an Exception is that it does the same thing as returning null; it causes the other handling to happen and triggers the protocol exception.

And the problem with returning an empty MemoryStream is that it causes "Premature end of file".

So...my only way out here would be to return a valid but otherwise empty XML fragment - and that comes with the drawback of making doc-available() return true(). This is ok for most our internal XSLTs, but in some cases we do rely on this to signal "nope, no document here" to the XSLT when certain condition are (not) met. We could work around this in the XSLT for the most part, but some of it might not be fully under our control.

For completeness sake, I added this xsl:message into Martin's XSLT before the xsl:sequence:

<xsl:message expand-text="yes">doc-available: {doc-available('custom-scheme://test')}</xsl:message>

Can we hook into the pipeline elsewhere to make this work like in Saxon 9? Or return something specific from GetEntity to prevent this?

Actions #15

Updated by Emanuel Wlaschitz 7 months ago

We just ran a few tests against Saxon 9; turns out it was "only" a warning there:

Warning at char 9 in xsl:sequence/@select on line 4 column 64 of custom_protocol.xsl:
  FODC0002: I/O error reported by XML parser processing custom-scheme://test: unknown
  protocol: custom-scheme

Same for throwing an exception (NotImplementedException for simplicity):

Warning at char 9 in xsl:sequence/@select on line 4 column 64 of custom_protocol.xsl:
  FODC0002: Exception thrown by URIResolver: The method or operation is not implemented.

So I guess the more strict handling of warnings/errors causes this to fail the transformation. It simply more-so happened to work on Saxon 9; more likely by chance rather than intentionally.

Which leaves me with the same question as before: How can we signal "this document is not available" through the InputXmlResolver?

Actions #16

Updated by Norm Tovey-Walsh 7 months ago

If you reach the point in your stylesheet where you're attempting to resolve a document() function, it's too late to signal that the document is unavailable. However, you can test if the document is available before calling the document function:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="3.0">
  <xsl:template match="/">
    <xsl:choose>
      <xsl:when test="doc-available('custom-scheme://test')">
	<xsl:sequence select="document('custom-scheme://test')"/>
      </xsl:when>
      <xsl:otherwise>
	<doc>unavailable</doc>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>
</xsl:stylesheet>

When I run that stylesheet in my test harness where the custom resolver returns null, it outputs:

<?xml version="1.0" encoding="UTF-8"?><doc>unavailable</doc>

Is that a sufficient workaround for your use case?

Actions #17

Updated by Emanuel Wlaschitz 7 months ago

Thats perfectly acceptable, thanks!

Now we only have to go back and figure out why we have locations that still fail this. But I'm guessing we're just lacking the doc-available() call there and expect the previous "empty node-set" behavior.

I think we can lay this report to rest then, since it's yet another expected (yet unexpected) behavior change and simply means we have to change our expectations.

Actions #18

Updated by Norm Tovey-Walsh 7 months ago

We try very hard to maintain backwards compatibility, but there's always a tension between "making improvements" and "not changing things". At major version boundaries, we allow ourselves more freedom to make improvements. (I maintain a few open source projects that use Saxon and I know exactly how painful those changes can be.)

I'm inclined to think that if returning null in Saxon 9 "worked" that it was a bug that it did so.

I'll leave this issue open for a bit, please do let us know if you have problems with the workaround.

Actions #19

Updated by Emanuel Wlaschitz 7 months ago

I'm inclined to think that if returning null in Saxon 9 "worked" that it was a bug that it did so.

I tend to agree at this point; especially after seeing a bunch of other reports where the result was "this is intended by the XSLT 3.0 spec" (or similar.)

We just have to deal with it for now.

Our current plan is to throw an exception that clearly states how the requested URI is unavailable, and suggest testing for it using doc-available() first. That way, we can leave some sort of documentation for end-users in case we do fail to update all necessary locations. In the end, this did reveal an actual bug in one of our XSLTs that didn't build the URI correct but silently appeared to be working on Saxon 9.

I'll report back once we checked (and updated) all our uses to leave a solution for others that might run into the same problem. Thanks again for getting us on track!

Actions #20

Updated by Michael Kay 7 months ago

  • Subject changed from Cannot use InputXmlHandler to implement custom scheme on Saxon-HE 10N to Cannot use InputXmlResolver to implement custom scheme on Saxon-HE 10N
  • Assignee set to Michael Kay
Actions #21

Updated by Michael Kay 7 months ago

  • Status changed from New to AwaitingInfo
  • Assignee changed from Michael Kay to Norm Tovey-Walsh
Actions #22

Updated by Emanuel Wlaschitz 6 months ago

Alright, we reviewed (and updated) our usages and it seems to work as expected now.

For anyone else running into this; the main change is testing for doc-available($customUrl) before attempting to load it with document($customUrl). We also made sure to change our XmlResolver.GetEntity implementation to throw a more descriptive exception (just in case it shows up in any log files or similar) but returning null is also fine (as long as the doc-available() test is present in the XSLT.)

Thanks again for the support!

Actions #23

Updated by Norm Tovey-Walsh 6 months ago

  • Status changed from AwaitingInfo to Closed

Glad you got it working. I'm going to close this as resolved, but feel free to open another issue if you run into more problems.

Please register to edit this issue

Also available in: Atom PDF