Feature #3125
closedDuplicate conflicts when schema is read twice after building from NodeInfo
100%
Description
on addSchemaSource:
we load xsds from jar so uris are like
jar:file:/C:/xsds.jar!/a.xsd
jar:file:/C:/xsds.jar!/b.xsd
jar:file:/C:/xsds.jar!/c.xsd
if a.xsd and b.xsd import c.xsd we end up with warning even with MULTIPLE_SCHEMA_IMPORTS=true
is it possible to turn the warning off and use MULTIPLE_SCHEMA_IMPORTS=false?
Files
Related issues
Updated by Petr K almost 8 years ago
- Copied from Feature #3081: suppress warning: The schema document at xxx.xsd is not being read because schema components for this namespace are already available. To force the schema document to be read... added
Updated by Petr K almost 8 years ago
the problem is actually in using NodeInfo in addSchemaSource method, see repo
Updated by Michael Kay almost 8 years ago
- Subject changed from suppress warning: The schema document at xxx.xsd is not being read because schema components for this namespace are already available. To force the schema document to be read... to Duplicate conflicts when schema is read twice after building from NodeInfo
I've changed the title because this doesn't seem to be about suppressing warnings, it seems to be about the fact that when you enable MULTIPLE_SCHEMA_IMPORTS, you are getting (spurious) errors due to conflicting type definitions with the same name.
If you import two schemas with the same namespace URI, and the two schemas contain multiple definitions for the same type (or other component), Saxon attempts to suppress the error if it can determine that the definitions are in fact identical.
Note that the spec is notoriously weak in this area: XSD 1.1 part 1 §3.4.6.5, "In other cases it is possible that conforming implementations will disagree as to whether components are identical."
More specifically to xs:import, XSD 1.1 part 1 §4.2.6.2 says: "Note: The above is carefully worded so that multiple ing of the same schema document will not constitute a violation of clause 2 of Schema Properties Correct (§3.17.6.1), but applications are allowed, indeed encouraged, to avoid ing the same schema document more than once to forestall the necessity of establishing identity component by component. Given that the schemaLocation [attribute] is only a hint, it is open to applications to ignore all but the first for a given namespace, regardless of the ·actual value· of schemaLocation, but such a strategy risks missing useful information when new schemaLocations are offered."
Saxon's test for the types being identical is based on whether the definitions occur at the same place in the same schema document. For this it uses the line number and URI information. Your test can be made to work by adding line number information:
private static void loadSchemaBad(EnterpriseConfiguration config, String urlStr) throws Exception {
URL url = new URL(urlStr);
Source s = new StreamSource(url.openConnection().getInputStream(), urlStr);
ParseOptions options = new ParseOptions();
options.setLineNumbering(true);
NodeInfo ni = config.buildDocumentTree(s, options).getRootNode();
config.addSchemaSource(ni);
}
If we don't have line numbers there's no other easy way of doing this. We could try to compare the path in the node tree, but we don't retain that information in the schema component model. The only other way of doing it would be to compare the two types "by value", that is to test whether all their properties are the same. That's a fairly major undertaking.
Updated by Michael Kay almost 8 years ago
A further observation: if I run this with MULTIPLE_SCHEMA_IMPORTS set to off, I get the warning, and then the errors:
Warning at /xsd:schema/xsd:import[1] in schema1.xsd:
The schema document at schema2.xsd is not being read because schema components for this
namespace are already available. To force the schema document to be read, set
--multipleSchemaImports:on
Error in schema1.xsd:
Duplicate type {concreteType} - previously defined on line 16 of
file:/users/mike/bugs/2017/petrk/AbstractType-2/src/schema1.xsd
Error in schema1.xsd:
Duplicate type {docType} - previously defined on line 9 of
file:/users/mike/bugs/2017/petrk/AbstractType-2/src/schema1.xsd
Error in schema1.xsd:
Duplicate type {abstractType} - previously defined on line 15 of
file:/users/mike/bugs/2017/petrk/AbstractType-2/src/schema1.xsd
That's because the setting MULTIPLE_SCHEMA_IMPORTS=off stops xs:import loading a schema for a namespace that's already known, but it doesn't stop a schema being loaded using config.addSchemaSource(). The API method will always attempt to load the schema, and then discard components on a component-by-component basis if they are found to be "identical" to existing components, under the rules above.
Updated by Petr K almost 8 years ago
why as in repo tested StreamSource works but NodeInfo not even thought SystemIds are the same?
Updated by Michael Kay about 7 years ago
Returning to this after a long pause...
The test case in question is unit test TestValidator/testMultipleSchemaImportsTT; it succeeds if method loadSchemaFromNode() includes the line options.setLineNumbering(true)
and fails with the default setting of false (because line numbers are used to check nodes for identity).
Looking more closely at the code, there has been an attempt to solve the problem for element declarations, which have a "generatedID" field that is actually the path to the element declaration in the source XSD; but this has not been implemented for other types of schema component, notably types (which is where this test case is failing).
Since most used-defined schema components appear to extend SchemaStructure, and SchemaStructure holds location information, I would have thought we could solve the problem at that level. Moreover, the location information is acquired via a call on SchemaStructure.setLocator(), which generally supplies the actual ElementImpl object of the XML tree representation of the schema document, and this is sufficient information to obtain the path (or a generateId, which would work just as well).
Updated by Michael Kay about 7 years ago
- Category set to Schema conformance
- Status changed from New to Resolved
- Assignee set to Michael Kay
- Applies to branch 9.8, trunk added
- Fix Committed on Branch 9.8, trunk added
I have extended the component identity check (on all components except Element Declarations, which are handled differently) to use a path in the schema document where line/column number information is not available.
Updated by Michael Kay about 7 years ago
Added a further patch. Test t:override-v-005 was failing because two types were not recognized as equivalent; this turned out to be because they had no generatedId value, which in turn was because the value held in a SimpleTypeDefinition was not being retained when the SimpleTypeDefinition is converted into a UserUnionType, UserListType, or UserAtomicType.
Updated by O'Neil Delpratt about 7 years ago
- Status changed from Resolved to Closed
- % Done changed from 0 to 100
- Fixed in Maintenance Release 9.8.0.6 added
Bug fix applied in the Saxon 9.8.0.6 maintenance release.
Please register to edit this issue