Bug #1972
closedxsl:break and very large file
0%
Description
Another streaming question. I'm testing extracting some data from the beginning of a very large file (>3GB). I'm using the following instruction:
<xsl:stream href="enwiktionary-20140102-pages-articles.xml">
<xsl:iterate select="mediawiki/page">
<xsl:copy-of select="."/>
<xsl:break/>
</xsl:iterate>
</xsl:stream>
(the source file is available from here: http://dumps.wikimedia.org/enwiktionary/20140102/enwiktionary-20140102-pages-articles.xml.bz2)
It's my understanding that this code should halt after encountering and copying the first page element from the the source file. When I run the stylesheet against a small sample file, I do indeed get just the first page element as I expect. When I run against the full document (which has a page element about 100 lines in), I see four of my cores go to about 50% and stay that way for more than 10 minutes. I've never actually managed to get it to finish execution. Is this normal?
I've attached the stylesheet and simplified sample file.
Files
Updated by Michael Kay almost 11 years ago
- Category set to Streaming
- Status changed from New to In Progress
- Assignee set to Michael Kay
- Priority changed from Low to Normal
The documentation does indeed say:
Note that when a xsl:iterate loop is terminated using xsl:break, parsing of the source document will be abandoned.
Unfortunately this is true only under rather restricted circumstances, and in particular in 9.5 it is not true when streaming is initiated using xsl:stream.
I think it should work if you do
<xsl:iterate select="saxon:stream(doc('enwiktionary-20140102-pages-articles.xml')/mediawiki/page)">
<xsl:copy-of select="."/>
<xsl:break/>
</xsl:iterate>
Please give this a try. As wth the other problem, I would use <xsl:template name="main"> and no principal source document.
Updated by Michael Kay over 10 years ago
- Status changed from In Progress to Won't fix
In 9.6 xsl:break will be much more effective at stopping the parser. saxon:stream() in 9.6 will become largely legacy (though still useful in XQuery); our main effort is going into supporting standard XSLT 3.0 streaming constructs. Therefore marking as "won't fix".
Please register to edit this issue