Has there been any intentional change or is there any known regression of the lack of early exit with streamed xsl:iterate/xsl:break?

Added by Martin Honnen almost 8 years ago

I think earlier 9.7 releases indicated an early exit of XML parsing when using @xsl:iterate/xsl:break@ and the @-t@ option from the command line, it seems Saxon-EE 9.7.0.7J is no longer doing that, when I try to parse a well-formed document I get no indication that parsing has been aborted and when I try an input document with a well-formedness violation well after the @xsl:break@ I get no result but a parse error. I am sure some months ago I could run such code and in case of a long and well-formed document I got a message indicating an early exit and I could even process mal-formed document if the @xsl:break@ occured before the XML well-formedness violation.

Has there been any intentional change or is there any known regression regarding an early exit?

A sample stylesheet is

A sample input is




	 ... 
	...
	...
	...
	...
	...
	...

which 9.7.0.7 processes fine but without indicating any early exit I thought the @xsl:break@ would trigger and when I add a further @@ start tag without matching end tag





	 ... 
	...
	...
	...
	...
	...
	...

then I simply get a parse error.

Replies (5)

Please register to reply

RE: Has there been any intentional change or is there any known regression of the lack of early exit with streamed xsl:iterate/xsl:break? - Added by Michael Kay almost 8 years ago

The answer to the headline question is that there weren't any deliberate changes, but I think there may have been some changes as an unintended side-effect of internal changes. I'll take a look at this particular case. The good news is that we're now monitoring the situation: tests where early exit is expected are now labelled as such, and we've found a way of testing that these tests are indeed taking an early exit.

RE: Has there been any intentional change or is there any known regression of the lack of early exit with streamed xsl:iterate/xsl:break? - Added by Martin Honnen almost 8 years ago

What has become of this issue? I have now run some tests with the current release Saxon-EE 9.7.0.8J and I don't get any indication of an early exit when running with the @-t@ option. Additionally, I have create some larger sample files and it seems processing time increases considerably with the increasing file size, even the code only tries to extract the first ten elements with e.g.

When extracting the first 10 items, for an input sample with @100@ items processing time is around @3ms@, for an input sample of @10.000@ items the processing time is around @16ms@ and for an input sample of @1.000.000@ items the processing time is around @595ms@.

It seems the test case @si-iterate-013@ in the W3C test suite does not check for @result early-exit-possible="true"@.

So is this feature of an early exit with @xsl:iterate/xsl:break@ no longer supported? I understand it is not something required by the XSLT spec but http://saxonica.com/html/documentation/sourcedocs/streaming/partial-reading.html suggests it is a Saxon feature.

RE: Has there been any intentional change or is there any known regression of the lack of early exit with streamed xsl:iterate/xsl:break? - Added by Michael Kay almost 8 years ago

We're aware that there are cases where early exit from parsing isn't happening when it might, and we're investigating, though other issues have jumped the priority queue. We now have assertions against many of the tests that ought to exit early, and internal mechanisms to test these assertions, so we know which tests are failing, but we haven't got to the bottom of why they are failing. The early exit mechanism requires some fairly "dirty" communication beween layers of the software, and there are quite a few different paths that have to be right, so it's not surprising if the code doesn't always spot the opportunity.

Equally, there are almost certainly tests to which the early-exit assertion could be added; in response to this comment I have added it to si-iterate-013.

RE: Has there been any intentional change or is there any known regression of the lack of early exit with streamed xsl:iterate/xsl:break? - Added by Michael Kay almost 8 years ago

On reflection, si-iterate-013 is a tough one. Because the iterate is defined on a template with match="/*", early exit is only possible by virtue of global stylesheet analysis to work out that there are no template rules that would produce any output when processing any comment or processing-instruction nodes that follow the end tag of the document element. Saxon doesn't attempt that kind of analysis: xsl:break only causes parser termination when the xsl:iterate is processing the whole document.

Pursuing this, Saxon 9.6 does an early exit on this test, and it is wrong to do so. If you add a processing instruction to the end of the document, and a template rule

Then 9.7 correctly appends the PI to the end of the result, while 9.6 fails to do so.

RE: Has there been any intentional change or is there any known regression of the lack of early exit with streamed xsl:iterate/xsl:break? - Added by Michael Kay almost 8 years ago

I've been looking at other test cases labelled with "early exit" where the current development branch is not doing an early exit.

sf-empty-101 - I think this is mislabelled. You can't tell that /accounts/transactions/dummy selects nothing until you've read to the end. You could possibly quit early if the document element is not account - though that depends on whether you permit "document fragments" in xsl:stream (Saxon doesn't).

sf-exists-101 - ditto.

sx-gc-gt-112 - seems mislabelled, since 346 is not greater than any PAGES element in the document, the whole document has to be read.

sx-gc-lt-012 - without checking, this is presumably equivalent.

and that's it. So on the 9.8 branch I don't think we have any issues. In 9.7 it's hard to be confident because the test driver hasn't been upgraded to check these early exit assertions.

(1-5/5)

Please register to reply

Project

Profile

Help

Saxon