Project

Profile

Help

Support #5652

closed

Questions about Xlink Schema

Added by Ben Weaver over 1 year ago. Updated over 1 year ago.

Status:
Closed
Priority:
Low
Assignee:
Category:
Resolvers
Sprint/Milestone:
-
Start date:
2022-08-17
Due date:
% Done:

0%

Estimated time:
Legacy ID:
Applies to branch:
Fix Committed on Branch:
Fixed in Maintenance Release:
Platforms:
Java

Description

We recently encountered an issue stemming from the way Saxon uses the built-in XLink schema. It appears the standard schema resolver forces Saxon's own version of xlink.xsd which means we can't add customization or restrictions to XLink without adding our own schema resolver (see Issue 3091).

1/ Could you help us understand the reasoning behind this change? It is confusing that it seems to apply only to the XLink schema.

2/ It would be helpful to have this behaviour documented somewhere. The "References to W3C DTDs" page is very clear and seemed relevant, but wasn't.

Additionally, the Saxon version of xlink.xsd references xml.xsd, which is not provided at all. This causes, in some cases, an attempt to retrieve xml.xsd from w3.org. It was here we encountered the problem, as the machine running the code was unable to connect to w3.org.

3/ Can Saxon supply the version of xml.xsd that is meant to be included by the Saxon version of xlink.xsd?

Would a catalog file enable us to supply our own cached version of xml.xsd to the xlink.xsd include?


Files

Actions #1

Updated by Norm Tovey-Walsh over 1 year ago

Saxonica Developer Community writes:

1/ Could you help us understand the reasoning behind this change? It
is confusing that it seems to apply only to the XLink schema.

I don’t think there’s any special handling for the XLink schema. In
Saxon 9/10, we have a cache of “well known schemas” that we resolve
locally instead of attempting to get them from www.w3.org. In Saxon 11,
this feature is supported by the XML Resolver library, but amounts to
the same thing.

2/ It would be helpful to have this behaviour documented somewhere.
The "References to W3C DTDs" page is very clear and seemed relevant,
but wasn't.

If we can work out exactly what behavior you’re seeing, we can try to
improve the documentation.

Additionally, the Saxon version of xlink.xsd references xml.xsd, which
is not provided at all. This causes, in some cases, an attempt to
retrieve xml.xsd from w3.org. It was here we encountered the problem,
as the machine running the code was unable to connect to w3.org.

What version of Saxon are you using, and can you provide an example that
demonstrates this behavior? I don’t think we ever distributed the XLink
schema without the XML schema, but I could be mistaken.

Would a catalog file enable us to supply our own cached version of
xml.xsd to the xlink.xsd include?

Yes, that should work straight away for Saxon 11. For earlier versions
of Saxon, you might have to configure the resolver explicitly.

Be seeing you,
norm

--
Norm Tovey-Walsh
Saxonica

Actions #2

Updated by Michael Kay over 1 year ago

Just to qualify the last paragraph: I think that when a schema imports the XML namespace, Saxon will always use its built-in version of the schema for that namespace, rather than fetching it via a resolver. This schema is "pre-cooked" in Saxon, we don't load it from a lexical schema document at all. Trying to provide a modified version of this schema will, I think, have no effect.

The reason for this is that it would be completely undefined what should happen if you modified some of the built-in declarations of attributes such as xml:space and xml:base. Those are things that users shouldn't be allowed to tamper with.

Actions #3

Updated by Jorge Williams over 1 year ago

Hey guys,

I don't think you guys are understanding the severity of the issue. This totally broke us and it took us a bit to figure out what the problem was.
We are currently using Saxon 10.x EE.

We've recently replaced the standard URL and Entity resolvers with ones based on https://xmlresolver.org because we make use of XML Catalogs. We didn't see the issue until we moved to this resolver.

We are not interested in changing xml.xsd.

We are interested in making modifications to xlink.xsd.

Here are the chain of events that lead to the issue:

  1. Saxon contains a standard schema resolver that given the xlink namespace uses its own copy and completely ignores our modified one.
  2. The copy xlink.xsd references xml.xsd -- but does not actually supply it
  3. The URL resolver attempts to resolve xml.xsd -- this is handled by XML Resolver
  4. XML Resolver tries to consult it's cache -- unfortunately the user that runs on our production and staging environment does not have access to a home directory after startup. So XML Resolver can't access ~/.xmlresolver.org/ and it's a cache miss
  5. XML Resolver attempts to retrieve xml.xsd from w3c.org and this stalls

I can understand the rationale behind caching xml.xsd -- but you are actually not doing that. You are caching xlink.xsd -- and we don't understand why. Also, this is totally undocumented we had to decompile the standard schema resolver to figure out that you were doing that.

Actions #4

Updated by Jorge Williams over 1 year ago

Added screenshot.

Actions #5

Updated by Michael Kay over 1 year ago

Thanks Jorge, now that we know we're talking about Saxon 10 this makes more sense. We made a lot of changes in this area in Saxon 11, so that tends to be our mindset at the moment.

Answering questions about the rationale of the change is a little difficult as it's so long ago (5 years?). As far as I can see the code you're looking at in Saxon 10 was unchanged from 9.8 and 9.9, but there was a change between 9.7 and 9.8 - in 9.7 if you imported the XLink schema with a different location hint, we respected the location hint.

Can I assume you're migrating from 9.7 or an earlier version?

This seems to have been documented as a change made in response to bug 3092, which states: "In addition, for 9.8 I have changed the handling of the XLink namespace so that the version issued with Saxon is used, regardless of the supplied schemaLocation URI. Users wanting a different version of the XLink schema, for example the old OpenGIS version, will then need to supply their own SchemaURIResolver."

I suspect the problem that motivated the change was dealing with the situation where different versions of the XLink schema are loaded concurrently. This would typically lead to Saxon rejecting the schema as invalid. Because different application-level standards have issued their own copy of the XLink schema with variations, this would otherwise make these schemas incompatible with each other.

I don't think the position is unreasonable that if you want a non-standard version of the XLink schema, you have to use your own schema resolver to supply it. (Alternatively, you could pre-load the schema into the Saxon Configuration). And I'm afraid any change would be hard to contemplate: we're managing the 10.x branch for maximum stability now, and a change in behaviour that might affect other users would be undesirable indeed. Your other option would be to move forward to Saxon 11, where the whole resolver architecture has been redesigned.

Actions #6

Updated by Jorge Williams over 1 year ago

No we didn't upgrade from 9.7. The issue was moving to resolver from https://xmlresolver.org/

The scenario where different XLink schema can be loaded concurrently can apply to any other standard schema. Why would your platform provide a special case for XLink over anything else? It would be good to understand how Saxon uses XLink to understand the impact of our changes to the schema. Ultimately as users of your platform, it's our responsibility to ensure that that schemas we choose work well together.

I understand not wanting to make a change in 10.x -- makes sense.

Taking a quick look at your code for version 11.4, it seems the behavior is still there. You ignore all locations for XLink. I don't think upgrading will fix the issue.

Actions #7

Updated by Norm Tovey-Walsh over 1 year ago

Saxonica Developer Community writes:

We are currently using Saxon 10.x EE.

We've recently replaced the standard URL and Entity resolvers with
ones based on https://xmlresolver.org because we make use of XML
Catalogs. We didn't see the issue until we moved to this resolver.

Okay. I’m a little confused about exactly how you have things
configured. You may have found a bug in XML Resolver, but I’ll need some
kind of reproducible test case to be sure.

1 Saxon contains a standard schema resolver that given the xlink
namespace uses its own copy and completely ignores our modified one.

Is that true even if you put an entry in the catalog you’re using that
points to the version you want to use?

2 The copy xlink.xsd references xml.xsd -- but does not actually
supply it

I assume this is related to the fact that, as Michael said, the
semantics of xml.xsd are baked in.

3 The URL resolver attempts to resolve xml.xsd -- this is handled by
XML Resolver

4 XML Resolver tries to consult it's cache -- unfortunately the user
that runs on our production and staging environment does not have
access to a home directory after startup. So XML Resolver can't access
~/.xmlresolver.org/ and it's a cache miss

Are you also using xmlresolverdata.jar? It’s certainly in there.

As of 4.5.0, XML Resolver no longer tries to cache by default.

5 XML Resolver attempts to retrieve xml.xsd from w3c.org and this
stalls

I can understand the rationale behind caching xml.xsd -- but you are
actually not doing that. You are caching xlink.xsd -- and we don't
understand why. Also, this is totally undocumented we had to decompile
the standard schema resolver to figure out that you were doing that.

The rationale was to avoid stalling requests to www.w3.org.

I’m eager to help resolve this, but I confess, I still don’t really
understand precisely how you have things configured. Is it possible to
provide a small test case that demonstrates the problem?

Be seeing you,
norm

--
Norm Tovey-Walsh
Saxonica

Actions #8

Updated by Jorge Williams over 1 year ago

Hey Norm,

1 Saxon contains a standard schema resolver that given the xlink namespace uses its own copy and completely ignores our modified one.

Is that true even if you put an entry in the catalog you’re using that points to the version you want to use?

Correct. The catalog will never be consulted. Look at the code in Saxon 10.6 for the Standard Schema Resolver. XLink is inserted and the URLHandler is never consulted, so catalog resolution will never occur.

I can understand the rationale behind caching xml.xsd -- but you are actually not doing that. You are caching xlink.xsd -- and we don't understand why. Also, this is totally undocumented we had to decompile the standard schema resolver to figure out that you were doing that.

The rationale was to avoid stalling requests to www.w3.org.

If the rationale is to avoid stalling requests in w3.org then I believe you should not do this in the Standard Schema Resolver (com.saxonica.ee.config.StandardSchemaResolver) to resolve XLink.xsd. The way things are written in 10.x, the URL resolver is completely bypassed. We're stuck with what you give us no matter what.

In 11.x you don't by-pass the URL Resolver, but you still completely ignore our location, so you force us to use a Catalog -- even though we tell the validator specifically which XSD to use. Why do this for XLink?

Is it possible to provide a small test case that demonstrates the problem?

It's difficult, because the worse part of the issue (the stalling) is env specific and probably related to a bug in xmlresolver

Having said this, if you remove the special case for XLink in com.saxonica.ee.config.StandardSchemaResolver the issue will go away. You should consider doing this in 11.x I don't understand why you need the special case, it doesn't sound like you need it especially if you have xmlresolver in the mix.

Actions #9

Updated by Michael Kay over 1 year ago

I've been doing some tests in this area (on the 11.x branch).

Firstly, I confirmed that if you supply a SchemaURIResolver, you can intercept a request for the XLink namespace, and return any schema document you like.

Secondly, I confirmed that if you import the XLink namespace supplying a location hint, without a SchemaURIResolver, the location hint is ignored. That is certainly conformant behaviour - processors are allowed (and even encouraged) to ignore the location hint if they already have built-in knowledge of the schema for the namespace. I think it's also desirable behaviour, because a lot of people are going to specify a location such as http://www.w3.org/..., and a local hard-baked copy should take precedence. Moreover, if we changed the behaviour, the change would be very disruptive.

Thirdly, I've confirmed that if we remove the special-case code in StandardSchemaResolver, it will attempt to use the location hint on the xs:import declaration. That immediately tells us that removing this code would be disruptive to other users; it might cause the schema document to be fetched over the web unnecessarily, or it might cause a failure, or it might cause a different version of the XLink schema to be loaded.

Fourthly: is it possible to map the XLink namespace to a user-supplied version of the XLink schema using a user-supplied catalog? Yes, I've tried this and it works.

So: if you try and pick up a local variant of the Xlink schema using a location hint, you won't succeed, but if you do it using a SchemaURIResolver or a catalog, it seems to work fine. I don't intend to change that behaviour, but I will try to make sure it is documented.

Actions #10

Updated by Michael Kay over 1 year ago

I've done yet more testing, and I've established that while you can use a SchemaUriResolver to get a non-standard XLink schema loaded, you can't use the processor-level resolver, at least not in the expected way. The StandardSchemaResolver is issuing a ResourceRequest to the processor-level resolver for the standard W3C location https://www.w3.org/1999/xlink.xsd, whereas I think it should be issuing a request for the target namespace http://www.w3.org/1999/xlink. Changed this. This could have an impact on anyone currently using a resolver or catalog that expects to see the URI https://www.w3.org/1999/xlink.xsd rather than http://www.w3.org/1999/xlink

Actions #11

Updated by Michael Kay over 1 year ago

  • Category set to Resolvers
  • Status changed from New to Closed
  • Assignee set to Michael Kay

Please register to edit this issue

Also available in: Atom PDF