Project

Profile

Help

Bug #5970

closed

DirectResourceResolver - unresolved XML classpath resources

Added by Steven Dürrenmatt about 1 year ago. Updated 12 months ago.

Status:
Closed
Priority:
Normal
Category:
Resolvers
Sprint/Milestone:
-
Start date:
2023-04-10
Due date:
% Done:

100%

Estimated time:
Legacy ID:
Applies to branch:
11, 12, trunk
Fix Committed on Branch:
12, trunk
Fixed in Maintenance Release:
Platforms:

Description

Seems identical to https://saxonica.plan.io/issues/5271 but for XML resources.

From Saxon 11, when trying to load an XML document from the classpath, java.net.MalformedURLException: unknown protocol: classpath exception can be thrown.

StringWriter sw = new StringWriter();
Configuration config = new Configuration();
XQueryExpression expression = config.newStaticQueryContext().compileQuery("doc('classpath:books.xml')");
expression.run(new DynamicQueryContext(config), new StreamResult(sw), new Properties());
System.out.println(sw);

In this example the resource is lastly resolved by the DirectResourceResolver, which relies on ResourceLoader.

For XSLT resources we have the following patch (could it be applied to all natures of resources?)

if (request.uri.startsWith("classpath:")) {
    String s = request.uri.substring(10);
    if (s.startsWith("/")) {
        s = s.substring(1);
    }
    stream = config.getDynamicLoader().getResourceAsStream(s);
}
Actions #1

Updated by Norm Tovey-Walsh about 1 year ago

Yes, it does. I will look again.

Actions #3

Updated by Norm Tovey-Walsh about 1 year ago

Indeed. I'm looking at that code in DirectResourceResolver that only applies the classpath: sniffing to XSLT resources. Right now, it has three branches:

if (XSLT) {
  if (classpath:) {
    handle that
  } else {
    use ResourceLoader
  }
} else if (XML or XSD) {
  use ResourceLoader
} else {
  use new StreamSource()
}

It sure feels like we could check for classpath: and use ResourceLoader otherwise in all cases...

Actions #4

Updated by Norm Tovey-Walsh about 1 year ago

In fact, why not move the classpath: checking into ResourceLoader.urlStream() and just use that?

Actions #5

Updated by Michael Kay about 1 year ago

Indeed, why not? There would be consequences, but they seem generally positive, like making query module locations and unparsed text locations work with classpath URIs.

Actions #6

Updated by Norm Tovey-Walsh about 1 year ago

Alas, not every context where ResourceLoader.urlStream() is called has access to the Configuration and we need the Configuration to call getDynamicLoader().

Could we just use the default loader if we don't have the configuration, or would that be a problem?

Do we need a flavor of urlStream() that handles classpath: if it has access to the Configuration?

Actions #7

Updated by Norm Tovey-Walsh about 1 year ago

There appear to be two places where we call urlStream() when there's no configuration available. In QueryReader and in UnparsedTextResource. Seems like it would be nice to be able to support classpath: in those places if we're going to try to support it in more places... :-(

Actions #8

Updated by Michael Kay about 1 year ago

I think that most callers can be made Configuration-aware with a bit of tweaking, e.g. QueryReader.readSourceQuery() and UnparsedTextResource.

UnparsedTextResource doesn't know the Configuration, but it's always instantiated via a Factory object which is supplied with the context and currently ignores it.

Actions #9

Updated by Norm Tovey-Walsh about 1 year ago

I started down that road. I don't see how UnparsedTextResource is always created by a factory though? I see that there is a factory but it has a public constructor and, for example, DataURIScheme constructs one directly.

Actions #10

Updated by Michael Kay about 1 year ago

I missed that one. But the callers of DataURIScheme.decode() have access to context?

It's probably one of those cases where you need a fallback if no configuration is available, but where most paths can be made to supply one.

Actions #11

Updated by Norm Tovey-Walsh about 1 year ago

  • Assignee changed from Norm Tovey-Walsh to Michael Kay
  • Priority changed from Low to Normal
  • Applies to branch trunk added
  • Fix Committed on Branch 12 added

This turned out to require making quite a few changes, but I think they all make sense:

  1. ResourceLoader.urlStream() and ResourceLoader.urlReader() now expect a Configuration. That configuration is used to resolve classpath: URIs. They’ve been refactored a little bit to avoid some code duplication.
  2. DirectResourceResolver.resolve() has been simplified to rely on ResourceLoader to get the input stream.
  3. StandardModuleURIResolver.toStreamSource() has been modified to accept an AugmentedSource. If it’s passed an AugmentedSource, it uses the contained source to get the stream.
  4. StandardURIResolver.setSAXInputSource has been simplified to rely on ResourceLoader to get the input stream.
  5. StandardUnparsedTextResolver.resolve() has been modified to accept an AugmentedSource. If it’s passed an AugmentedSource, it uses the contained source to get the stream. To make that work, I’ve added a getReaderFromSAXSource() to handle the contained source (which is usually/always a SAXSource). Refactored a bit to avoid some code duplication.
  6. QueryReader.readSourceQuery() now takes a Configuration argument so it can use ResourceLoader.urlStream()
  7. AbstractResourceCollection.getInputStream(), AbstractResourceCollection.obtainBinaryContent(), and AbstractResourceCollection.obtainCharacterContent now take a Configuration argument so they can use ResourceLoader.urlStream().
  8. DataURIScheme uses UnparsedTextResource.FACTORY to create an UnparsedTextResource. It doesn’t have a Configuration handy so it just passes null as the Configuration but that’s ok becuase a data: URI can never be a classpath: URI.
  9. JSONResource and UnknownResource now hold onto the Configuration that created them so that they can use ResourceLoader.urlStream().
  10. The constructor for UnparsedTextResource is now private. The FACTORY can be used to obtain one. The Configuration is now used by the constructor so that getContent() can use ResourceLoader.urlStream().

Mike it would be good to get your review on commits 891c1849 and 9567e592. (I haven't merged them into main yet, pending your review.)

Actions #12

Updated by Norm Tovey-Walsh about 1 year ago

FYI: I've broken the SaxonCS build ... investigating.

Actions #13

Updated by Norm Tovey-Walsh about 1 year ago

I've fixed the SaxonCS problems, I think. See commit 79303539

Actions #14

Updated by Michael Kay about 1 year ago

  • Assignee changed from Michael Kay to Norm Tovey-Walsh
Actions #17

Updated by Norm Tovey-Walsh about 1 year ago

  • Assignee changed from Norm Tovey-Walsh to Michael Kay
Actions #20

Updated by Norm Tovey-Walsh almost 1 year ago

  • Status changed from New to Resolved
  • Assignee changed from Michael Kay to Norm Tovey-Walsh
  • Fix Committed on Branch trunk added

Right. I've pushed the fixes to main as well. The patch didn't apply cleanly to 11 so I haven't done that one yet.

Actions #21

Updated by O'Neil Delpratt 12 months ago

  • Status changed from Resolved to Closed
  • % Done changed from 0 to 100
  • Fixed in Maintenance Release 12.2 added

Bug fix applied in the Saxon 12.2 maintenance release

Please register to edit this issue

Also available in: Atom PDF