Bug #3202
closed--multipleSchemaImports has no effect when processing a cycle of schema documents
100%
Description
The configuration flag --multipleSchemaImports changes Saxon's behaviour when a schemadoc A imports B and C, and B and C have the same target namespace. However it has no effect when A imports B and B imports C, and A and C have the same target namespace.
This is counter-intuitive, it's not clear from the documentation, and it's different from the behaviour of the similar flag honour-all-schemaLocations
in Xerces. It also means that it doesn't work as required for the WIPO and USPTO schemas (see private email from Kenneth Hughes).
Updated by Michael Kay over 7 years ago
- Status changed from New to In Progress
In the code of XSDImport, the challenge is how to distinguish this case from the case where two threads are compiling the same schema simultaneously. We also need to be sure that true cycles (where the same schemaLocation is re-imported) are detected.
Updated by Michael Kay about 7 years ago
Handling another related set of issues with an XBRL-based schema (MHK IntelliJ project Saxon9.8/XBRLTest)
The problem manifests itself here in that an element declared as belonging to a substitution group is not actually added to that substitution group; this appears to be happening because there are multiple ElementDeclarations created corresponding to the same physical element declaration read by different routes, and the element is added to the wrong one of these, which is subsequently discarded when it is found to be a duplicate.
While investigating this I see that XSDImport is looking to see whether a schema is already loaded for a namespace/location pair, and this check is failing because the list of known namespace/location pairs is owned by the SchemaCompiler object, and we are creating a new SchemaLocation object every time we call Configuration.addSchemaSource(). In this application, there are many calls on s9api SchemaManager.load() for different top-level schema documents, many of which have common documents within their document tree; because these are being loaded using different SchemaCompiler objects, the caching has no effect. I don't think this is the whole story as far as the issue with substitution groups is concerned (after all, the top-level calls could have specified different schema locations) but it would help to clear this up.
Updated by Michael Kay about 7 years ago
I tried introducing a schema document cache held at the level of the configuration rather than the SchemaCompiler. This causes a failure
Error on line 53 of generic-link.xsd:
The content model of the complex type linkType is not a valid restriction of the content
model of the type extendedType. Restricted type allows element loc where the base type does not
which is the same failure as we get if FeatureKeys.MULTIPLE_SCHEMA_IMPORTS is switched off.
In fact extendedType allows the abstract element locator, and loc is in the substitution group of locator. So this shouldn't be an error. I now suspect that the only reason MULTIPLE_SCHEMA_IMPORTS is being used is to circumvent this error; but the fact that the error goes away is a delusion; there's still a problem with fixup of substitution groups but it isn't being reported in the same way.
Looking at the schema component model in the debugger, we see that loc is indeed in the substitution group of locator, but the FSA for extendedType has transitions for locator, but not for loc, suggesting that loc was added to the substitution group after the FSA was compiled.
Updated by Michael Kay about 7 years ago
I note in passing that the code handling "deferred validation mode" in the SchemaCompiler appears to be defunct - never invoked. This was originally designed to allow a number of schemas to be separately loaded (as in this application) with no cross-validation being done until the end.
Updated by Michael Kay about 7 years ago
The sequence of events seems to be as follows.
App Loading http://eiopa.europa.eu/eu/xbrl/s2c/dict/dom/la/mem.xsd
Add loc to the substitution group of locator
Compile FSA for extendedType - the FSA is correct
Load various additional schemas
App Loading http://www.xbrl.org/2003/xl-2003-12-31.xsd
Compile FSA for extendedType - this time the substitution group membership is empty so the DFSA is incorrect.
What seems to happen here is that when the app loads xl-2003-12-31.xsd, the original components for both locator and extendedType are replaced, so knowledge of the substitution group is lost.
Why are they replaced? The logic in PreparedSchema (repeated for each type of component) is:
ElementDecl existing = elementsByName.get(elementDecl.getComponentName());
if (existing == null || existing.getRedefinitionLevel() <= elementDecl.getRedefinitionLevel()) {
... replace the component ...
}
The less-or-equal comparison on the redefinition level seems wrong; if redefines is not used, then the new component and the old will both have redefinition level 0, so we end up replacing the old component with the new, which means we replace a "locator" that has "loc" in its substitution group with one that does not.
However, changing this doesn't fix the problem. While doing
App Loading http://www.xbrl.org/2010/generic-message.xsd
we create a new version of the extendedType component, which we then recompile, triggered by adding a new member (message) to the substitution group of element "resource". The new version of extendedType refers to a new version of "locator" that does not have "loc" in its substitution group.
Updated by Michael Kay about 7 years ago
I've extracted the problem with merging of substitution groups into a new bug #3531
Updated by Michael Kay about 5 years ago
- Description updated (diff)
- Status changed from In Progress to Resolved
- Applies to branch 9.8, 9.9, trunk added
- Fix Committed on Branch 9.9, trunk added
I have written a Junit test case for this: TestValidator/testCycleMultipleSchemaImportsOn
. The test is currently failing in 9.9.
The issue is at XSDImport line 124:
if (status == PreparedSchema.NAMESPACE_UNDER_CONSTRUCTION) {
// This implies a cycle of imports. Alternatively, it means that a concurrent
// process is loading a schema with this namespace...
return;
Added logic that detects that the schema currently being constructed for the namespace is at a different location, and decides what to do based on the MULTIPLE_SCHEMA_IMPORTS configuration option.
Fix applied to 9.9 and trunk.
Updated by O'Neil Delpratt over 4 years ago
- Fixed in Maintenance Release 10.0 added
Bug fix applied in the Saxon 10 major release.
Updated by O'Neil Delpratt about 4 years ago
- Status changed from Resolved to Closed
- % Done changed from 0 to 100
- Fixed in Maintenance Release 9.9.1.8 added
Bug fix applied on the Saxon 9.9.1.8 maintenance release.
Please register to edit this issue