Project

Profile

Help

Bug #1972

closed

xsl:break and very large file

Added by Nick Nunes over 10 years ago. Updated about 10 years ago.

Status:
Won't fix
Priority:
Normal
Assignee:
Category:
Streaming
Sprint/Milestone:
-
Start date:
2014-01-05
Due date:
% Done:

0%

Estimated time:
Legacy ID:
Applies to branch:
Fix Committed on Branch:
Fixed in Maintenance Release:
Platforms:

Description

Another streaming question. I'm testing extracting some data from the beginning of a very large file (>3GB). I'm using the following instruction:

<xsl:stream href="enwiktionary-20140102-pages-articles.xml">
  <xsl:iterate select="mediawiki/page">
    <xsl:copy-of select="."/>
    <xsl:break/>
  </xsl:iterate>
</xsl:stream>

(the source file is available from here: http://dumps.wikimedia.org/enwiktionary/20140102/enwiktionary-20140102-pages-articles.xml.bz2)

It's my understanding that this code should halt after encountering and copying the first page element from the the source file. When I run the stylesheet against a small sample file, I do indeed get just the first page element as I expect. When I run against the full document (which has a page element about 100 lines in), I see four of my cores go to about 50% and stay that way for more than 10 minutes. I've never actually managed to get it to finish execution. Is this normal?

I've attached the stylesheet and simplified sample file.


Files

thin.wiktionary.xsl (512 Bytes) thin.wiktionary.xsl Nick Nunes, 2014-01-06 08:42
test.xml (1.87 KB) test.xml Nick Nunes, 2014-01-06 08:42
Actions #1

Updated by Michael Kay over 10 years ago

  • Category set to Streaming
  • Status changed from New to In Progress
  • Assignee set to Michael Kay
  • Priority changed from Low to Normal

The documentation does indeed say:

Note that when a xsl:iterate loop is terminated using xsl:break, parsing of the source document will be abandoned.

Unfortunately this is true only under rather restricted circumstances, and in particular in 9.5 it is not true when streaming is initiated using xsl:stream.

I think it should work if you do

<xsl:iterate select="saxon:stream(doc('enwiktionary-20140102-pages-articles.xml')/mediawiki/page)">
  <xsl:copy-of select="."/>
  <xsl:break/>
</xsl:iterate>

Please give this a try. As wth the other problem, I would use <xsl:template name="main"> and no principal source document.

Actions #2

Updated by Michael Kay about 10 years ago

  • Status changed from In Progress to Won't fix

In 9.6 xsl:break will be much more effective at stopping the parser. saxon:stream() in 9.6 will become largely legacy (though still useful in XQuery); our main effort is going into supporting standard XSLT 3.0 streaming constructs. Therefore marking as "won't fix".

Please register to edit this issue

Also available in: Atom PDF