Support #2252: Strict XSL validation yields less messages than validating during parse - Saxon - Saxonica Developer Community

Actions

Send by e-mail Copy link

Support #2252

closed

Strict XSL validation yields less messages than validating during parse

Added by Jason Mihalick over 9 years ago. Updated over 9 years ago.

Status:

Closed

Priority:

Normal

Assignee:

Category:

Sprint/Milestone:

Start date:

2014-12-09

Due date:

% Done:

Estimated time:

Legacy ID:

Applies to branch:

Fix Committed on Branch:

Fixed in Maintenance Release:

Platforms:

Description

I am using Saxon-EE Java version 9.5.1.7.

We have the following scenario using xsl:validation="strict" in a stylesheet.

I run a source document through a transformation which has xsl:validation="strict" on the root element of the output XML, I get the following 4 error messages:

'In content of element : The content model does not allow element to appear immediately after element . The following elements would be valid here, all in no namespace: funding-info, processing-relationships, x (or nothing). '
'In content of element : The content model does not allow element to appear immediately after element
. Expected
or nothing. '
'In content of element : The content model does not allow element to appear immediately after element . It must be preceded by one of: , . '
'In content of element : The content model does not allow element to appear as the first child. Expected . '

But when I run the same transformation WITHOUT the xsl:validation="strict" present, and then validate the transformed output XML afterwards (using an XMLReader), I get the following 10 error messages (which is closer to what I expect to see):

'cvc-complex-type.2.4.b: The content of element 'ref' is not complete. One of '{label, live-change, acs-titles, acs-no-titles, acs-biochem, citation, note}' is expected.'
'cvc-attribute.3: The value 'sameIdCheck' of attribute 'id' on element 'ref' is not valid with respect to its type, 'ID'.'
'cvc-id.2: There are multiple occurrences of ID value 'sameIdCheck'.'
'cvc-complex-type.2.4.b: The content of element 'ref' is not complete. One of '{label, live-change, acs-titles, acs-no-titles, acs-biochem, citation, note}' is expected.'
'cvc-complex-type.2.4.a: Invalid content was found starting with element 'ref'. One of '{funding-info, processing-relationships, x}' is expected.'
'cvc-attribute.3: The value '´' of attribute 'id' on element 'fig' is not valid with respect to its type, 'ID'.'
'cvc-datatype-valid.1.2.1: '´' is not a valid value for 'NCName'.'
'cvc-complex-type.2.4.a: Invalid content was found starting with element 'fig'. One of '{p}' is expected.'
'cvc-complex-type.2.4.a: Invalid content was found starting with element 'contrib-group'. One of '{document-title, web-title}' is expected.'
'cvc-complex-type.2.4.a: Invalid content was found starting with element 'foo'. One of '{journal-id}' is expected.'

Should I expect roughly the same messages to be produced during the transformation validation of the output vs. a transformation followed by a validation? I am not expecting the text to match exactly, but I was expecting to see basically the same errors caught. Notice, for example, that none of the ID related errors were caught when validating during transformation.

I can provide a stylesheet and source XML if necessary, but I thought I would start at a higher level first before we get deeper. Perhaps there is a reason in the Saxon implementation or in the spec that this scenario of validating during transformation vs. validating afterwards will never produce the same errors.

Files

Download all files

schema.xsd (749 Bytes) schema.xsd		Jason Mihalick, 2014-12-10 23:43
SaxonIDValidationTest.java (6.06 KB) SaxonIDValidationTest.java		Jason Mihalick, 2014-12-10 23:43

Actions

Copy link

Updated by Michael Kay over 9 years ago

I assume that in the second case (validation of the output using an XMLReader) you were using the Xerces schema validator rather than the Saxon schema validator. There's no intrinsic reason to expect two different validators to report the same errors. A schema processor has to make judgments about this; for example, if an element isn't supposed to appear in a particular position, should it still validate the content of that element? The last thing anyone wants is multiple error messages all caused by what is essentially a single error. All you can really expect is that two validators agree about whether the document is valid or not. To investigate this in more detail one would need to look at the actual document and schema.

I think that when Saxon finds a "content model" violation, i.e. an element that isn't valid by virtue of the content model of the parent element, then it basically stops validating the following sublings of that element, on the grounds that any errors reported with regard to those siblings are quite likely to be spurious.

There's also a possibility of differences caused by the fact that in one case, validation is within the context of a transformation. In principle, a single validation error is enough to cause the transformation to fail, and once an error occurs during a transformation, it's perfectly entitled to stop there and then. In practice, though, for validation of output documents Saxon doesn't do that: or at any rate, it doesn't do so by default.

But I think it would be interesting to compare a freestanding Saxon validation of the result document (using the com.saxonica.Validate) command with a free-standing Xerces validation, taking XSLT out of the equation. It would then be interesting to look at the actual Xerces messages and see whether any of them are related, in the sense that they relate to the same or related elements.

Actions

Copy link

Updated by Jason Mihalick over 9 years ago

That explanation is very helpful, thank you. I just ran a quick test using com.saxonica.Validate on a nearly identical document as the source document I mentioned originally. The messages output are very similar to the messages I got using Saxon with validation during the transformation. The number of messages is the same as well (4).

XSD99999: In content of element : The content model does not allow element to appear as the first child. Expected .
XSD99999: In content of element : The content model does not allow element to appear immediately after element . It must be preceded by one of: , .
XSD99999: In content of element : The content model does not allow element to appear immediately after element
. Expected
or nothing.
XSD99999: In content of element : The content model does not allow element to appear immediately after element . The following elements would be valid here, all in no namespace: funding-info, processing-relationships, x (or nothing).

I believe the messages I included above in my second set of messages are the equivalent of what Xerces would produce in a freestanding validation.

I think you correct that the Xerces messages not output by Saxon are messages which are associated with the descendants of the elements that Saxon did flag. I will be doing some more testing to confirm that the other 6 errors Xerces flagged are also flagged by Saxon when the 4 errors they have in common are resolved. I'm sure they will be! Thanks again!

Actions

Copy link

Updated by Jason Mihalick over 9 years ago

File schema.xsd schema.xsd added
File SaxonIDValidationTest.java SaxonIDValidationTest.java added

After I eliminated the errors that the two validation methods had in common, I ran another test case and it seemed that I was not getting any errors for IDREFs with no corresponding ID. So I created a bare bones test case that demonstrates what I am seeing. I've attached two files, SaxonIDValidateTest.java and schema.xsd. I ran under Java 6 with Saxon-EE 9.5.1.7. Just make sure you put the schema.xsd file in the working directory that you run SaxonIDValidateTest.java from and you should get this result output:

doValidationOnTransform SUCCESS.
doValidationAfterTransform FAILED!
Validation error 
  XTTE1555: There is one IDREF value with no corresponding ID:
    ID002 (See http://www.w3.org/TR/xmlschema11-1/#cvc-id clause 1)
ValidationException: There is one IDREF value with no corresponding ID:
    ID002
	at com.saxonica.validate.IdValidator.close(IdValidator.java:251)
	at net.sf.saxon.event.ProxyReceiver.close(ProxyReceiver.java:101)
	at net.sf.saxon.event.ProxyReceiver.close(ProxyReceiver.java:101)
	at net.sf.saxon.event.ProxyReceiver.close(ProxyReceiver.java:101)
	at net.sf.saxon.event.ProxyReceiver.close(ProxyReceiver.java:101)
	at net.sf.saxon.event.ProxyReceiver.close(ProxyReceiver.java:101)
	at net.sf.saxon.event.ProxyReceiver.close(ProxyReceiver.java:101)
	at net.sf.saxon.event.ReceivingContentHandler.endDocument(ReceivingContentHandler.java:232)
	at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.endDocument(AbstractSAXParser.java:737)
	at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:516)
	at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:808)
	at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:737)
	at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:119)
	at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1205)
	at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:522)
	at net.sf.saxon.event.Sender.sendSAXSource(Sender.java:428)
	at net.sf.saxon.event.Sender.send(Sender.java:170)
	at com.saxonica.Validate.processFile(Validate.java:548)
	at SaxonIDValidationTest.doValidationAfterTransform(SaxonIDValidationTest.java:129)
	at SaxonIDValidationTest.main(SaxonIDValidationTest.java:146)

Actions

Copy link

Updated by Michael Kay over 9 years ago

ID/IDREF constraints apply at document level. In your XSLT, you are invoking validation at element level. You will only get ID/IDREF errors if you validate the whole document, not an individual element, even if it is the root element of the document.

Actions

Copy link

Updated by Jason Mihalick over 9 years ago

Thank you again for the assist. I've done some googling and I haven't found a way to validate the whole document during transformation. If there is a way, I would appreciate if you could refer me to some documentation on it. Otherwise, I will proceed to implement the full document validation after the transform via a parser validation with an XMLReader.

Actions

Copy link

Updated by Michael Kay over 9 years ago

You can do <xsl:document validation="strict"> to create the output document: or xsl:result-document with no href has pretty much the same effect. That is, you typically replace

<xsl:template match="/">
  <root>
    ,, code here
  </root>
</xsl:template>

with

<xsl:template match="/">
 <xsl:document validation="strict">
  <root>
    ,, code here
  </root>
 </xsl:document>
</xsl:template>

The detailed rules for what gets validated can be found here:

http://www.w3.org/TR/2009/PER-xslt20-20090421/#validation

Actions

Copy link

Updated by Michael Kay over 9 years ago

Status changed from New to Closed

Marking this as closed since the question has been answered and there has been no pushback for 11 days. Feel free to reopen if the problem remains.

Please register to edit this issue

Actions

Send by e-mail Copy link

Also available in: Atom PDF

Project

Profile

Help

Saxon

Support #2252

Strict XSL validation yields less messages than validating during parse

Updated by Michael Kay over 9 years ago

Updated by Jason Mihalick over 9 years ago

Updated by Jason Mihalick over 9 years ago

Updated by Michael Kay over 9 years ago

Updated by Jason Mihalick over 9 years ago

Updated by Michael Kay over 9 years ago

Updated by Michael Kay over 9 years ago