Project

Profile

Help

Feature #3125

closed

Duplicate conflicts when schema is read twice after building from NodeInfo

Added by Petr K over 7 years ago. Updated over 6 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
Schema conformance
Sprint/Milestone:
-
Start date:
2016-12-20
Due date:
% Done:

100%

Estimated time:
Legacy ID:
Applies to branch:
9.8, trunk
Fix Committed on Branch:
9.8, trunk
Fixed in Maintenance Release:
Platforms:

Description

on addSchemaSource:

we load xsds from jar so uris are like

jar:file:/C:/xsds.jar!/a.xsd

jar:file:/C:/xsds.jar!/b.xsd

jar:file:/C:/xsds.jar!/c.xsd

if a.xsd and b.xsd import c.xsd we end up with warning even with MULTIPLE_SCHEMA_IMPORTS=true

is it possible to turn the warning off and use MULTIPLE_SCHEMA_IMPORTS=false?


Files

addSchemaSource.zip (4.12 KB) addSchemaSource.zip Petr K, 2017-02-01 13:10

Related issues

Copied from Saxon - Feature #3081: suppress warning: The schema document at xxx.xsd is not being read because schema components for this namespace are already available. To force the schema document to be read...Won't fix2016-12-20

Actions
Actions #1

Updated by Petr K over 7 years ago

  • Copied from Feature #3081: suppress warning: The schema document at xxx.xsd is not being read because schema components for this namespace are already available. To force the schema document to be read... added
Actions #2

Updated by Petr K over 7 years ago

the problem is actually in using NodeInfo in addSchemaSource method, see repo

Actions #3

Updated by Michael Kay over 7 years ago

  • Subject changed from suppress warning: The schema document at xxx.xsd is not being read because schema components for this namespace are already available. To force the schema document to be read... to Duplicate conflicts when schema is read twice after building from NodeInfo

I've changed the title because this doesn't seem to be about suppressing warnings, it seems to be about the fact that when you enable MULTIPLE_SCHEMA_IMPORTS, you are getting (spurious) errors due to conflicting type definitions with the same name.

If you import two schemas with the same namespace URI, and the two schemas contain multiple definitions for the same type (or other component), Saxon attempts to suppress the error if it can determine that the definitions are in fact identical.

Note that the spec is notoriously weak in this area: XSD 1.1 part 1 §3.4.6.5, "In other cases it is possible that conforming implementations will disagree as to whether components are identical."

More specifically to xs:import, XSD 1.1 part 1 §4.2.6.2 says: "Note: The above is carefully worded so that multiple ing of the same schema document will not constitute a violation of clause 2 of Schema Properties Correct (§3.17.6.1), but applications are allowed, indeed encouraged, to avoid ing the same schema document more than once to forestall the necessity of establishing identity component by component. Given that the schemaLocation [attribute] is only a hint, it is open to applications to ignore all but the first for a given namespace, regardless of the ·actual value· of schemaLocation, but such a strategy risks missing useful information when new schemaLocations are offered."

Saxon's test for the types being identical is based on whether the definitions occur at the same place in the same schema document. For this it uses the line number and URI information. Your test can be made to work by adding line number information:

private static void loadSchemaBad(EnterpriseConfiguration config, String urlStr) throws Exception {
        URL url = new URL(urlStr);
        Source s = new StreamSource(url.openConnection().getInputStream(), urlStr);
        ParseOptions options = new ParseOptions();
        options.setLineNumbering(true);
        NodeInfo ni = config.buildDocumentTree(s, options).getRootNode();
        config.addSchemaSource(ni);
    }

If we don't have line numbers there's no other easy way of doing this. We could try to compare the path in the node tree, but we don't retain that information in the schema component model. The only other way of doing it would be to compare the two types "by value", that is to test whether all their properties are the same. That's a fairly major undertaking.

Actions #4

Updated by Michael Kay over 7 years ago

A further observation: if I run this with MULTIPLE_SCHEMA_IMPORTS set to off, I get the warning, and then the errors:

Warning at /xsd:schema/xsd:import[1] in schema1.xsd:
  The schema document at schema2.xsd is not being read because schema components for this
  namespace are already available. To force the schema document to be read, set
  --multipleSchemaImports:on
Error in schema1.xsd:
  Duplicate type {concreteType} - previously defined on line 16 of
  file:/users/mike/bugs/2017/petrk/AbstractType-2/src/schema1.xsd
Error in schema1.xsd:
  Duplicate type {docType} - previously defined on line 9 of
  file:/users/mike/bugs/2017/petrk/AbstractType-2/src/schema1.xsd
Error in schema1.xsd:
  Duplicate type {abstractType} - previously defined on line 15 of
  file:/users/mike/bugs/2017/petrk/AbstractType-2/src/schema1.xsd

That's because the setting MULTIPLE_SCHEMA_IMPORTS=off stops xs:import loading a schema for a namespace that's already known, but it doesn't stop a schema being loaded using config.addSchemaSource(). The API method will always attempt to load the schema, and then discard components on a component-by-component basis if they are found to be "identical" to existing components, under the rules above.

Actions #5

Updated by Petr K over 7 years ago

why as in repo tested StreamSource works but NodeInfo not even thought SystemIds are the same?

Actions #6

Updated by Michael Kay over 6 years ago

Returning to this after a long pause...

The test case in question is unit test TestValidator/testMultipleSchemaImportsTT; it succeeds if method loadSchemaFromNode() includes the line options.setLineNumbering(true) and fails with the default setting of false (because line numbers are used to check nodes for identity).

Looking more closely at the code, there has been an attempt to solve the problem for element declarations, which have a "generatedID" field that is actually the path to the element declaration in the source XSD; but this has not been implemented for other types of schema component, notably types (which is where this test case is failing).

Since most used-defined schema components appear to extend SchemaStructure, and SchemaStructure holds location information, I would have thought we could solve the problem at that level. Moreover, the location information is acquired via a call on SchemaStructure.setLocator(), which generally supplies the actual ElementImpl object of the XML tree representation of the schema document, and this is sufficient information to obtain the path (or a generateId, which would work just as well).

Actions #7

Updated by Michael Kay over 6 years ago

  • Category set to Schema conformance
  • Status changed from New to Resolved
  • Assignee set to Michael Kay
  • Applies to branch 9.8, trunk added
  • Fix Committed on Branch 9.8, trunk added

I have extended the component identity check (on all components except Element Declarations, which are handled differently) to use a path in the schema document where line/column number information is not available.

Actions #8

Updated by Michael Kay over 6 years ago

Added a further patch. Test t:override-v-005 was failing because two types were not recognized as equivalent; this turned out to be because they had no generatedId value, which in turn was because the value held in a SimpleTypeDefinition was not being retained when the SimpleTypeDefinition is converted into a UserUnionType, UserListType, or UserAtomicType.

Actions #9

Updated by O'Neil Delpratt over 6 years ago

  • Status changed from Resolved to Closed
  • % Done changed from 0 to 100
  • Fixed in Maintenance Release 9.8.0.6 added

Bug fix applied in the Saxon 9.8.0.6 maintenance release.

Please register to edit this issue

Also available in: Atom PDF