Project

Profile

Help

Bug #1972

closed

xsl:break and very large file

Added by Nick Nunes over 10 years ago. Updated about 10 years ago.

Status:
Won't fix
Priority:
Normal
Assignee:
Category:
Streaming
Sprint/Milestone:
-
Start date:
2014-01-05
Due date:
% Done:

0%

Estimated time:
Legacy ID:
Applies to branch:
Fix Committed on Branch:
Fixed in Maintenance Release:
Platforms:

Description

Another streaming question. I'm testing extracting some data from the beginning of a very large file (>3GB). I'm using the following instruction:

<xsl:stream href="enwiktionary-20140102-pages-articles.xml">
  <xsl:iterate select="mediawiki/page">
    <xsl:copy-of select="."/>
    <xsl:break/>
  </xsl:iterate>
</xsl:stream>

(the source file is available from here: http://dumps.wikimedia.org/enwiktionary/20140102/enwiktionary-20140102-pages-articles.xml.bz2)

It's my understanding that this code should halt after encountering and copying the first page element from the the source file. When I run the stylesheet against a small sample file, I do indeed get just the first page element as I expect. When I run against the full document (which has a page element about 100 lines in), I see four of my cores go to about 50% and stay that way for more than 10 minutes. I've never actually managed to get it to finish execution. Is this normal?

I've attached the stylesheet and simplified sample file.


Files

thin.wiktionary.xsl (512 Bytes) thin.wiktionary.xsl Nick Nunes, 2014-01-06 08:42
test.xml (1.87 KB) test.xml Nick Nunes, 2014-01-06 08:42

Please register to edit this issue

Also available in: Atom PDF