Project

Profile

Help

Support #3100

closed

Processing many files

Added by Erik Wilde almost 8 years ago. Updated almost 8 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
-
Sprint/Milestone:
-
Start date:
2017-01-10
Due date:
% Done:

0%

Estimated time:
Legacy ID:
Applies to branch:
Fix Committed on Branch:
Fixed in Maintenance Release:
Platforms:

Description

As mentioned in an email conversation, I am trying to process many files (~2000), all of them moderate in size (1-2MB), but it seems like I am no using discard-document correctly. I always get Java heap space errors, so I am assuming that I must be doing something wrong. My starting point was the code shown on http://www.saxonica.com/html/documentation/functions/saxon/discard-document.html, but maybe I am doing something wrong in the way how I use it.


Files

training.xsl (696 Bytes) training.xsl Erik Wilde, 2017-01-10 19:02
Actions #1

Updated by Michael Kay almost 8 years ago

Because you load the files in the form of a collection, and the collection is in a global variable, there is a link to the documents for the duration of the transformation even though they have been discarded from the document pool.

saxon:discard-document() is designed for documents read using the document() or doc() function rather than using collection(). Your best approach might be to use the uri-collection() function to get a set of URIs, and then to fetch individual documents using doc() based on those URIs.

But you may be able to get away with simply dropping the global variable and doing

xsl:for-each select="collection(...)!discard-document(.)"

Note that 9.7 introduced significant changes to the way collections work internally -- I'm not sure which release you are on. The new CollectionFinder interface in 9.7 opens the door to a lot of Java API capability for controlling how the resources used by collections are managed.

Actions #2

Updated by Michael Kay almost 8 years ago

  • Status changed from New to Closed
Actions #3

Updated by Erik Wilde almost 8 years ago

thanks for the suggestions! i have now used uri-collection() and then

open the files individually via doc(). that seems to work well, and i

was able to open and process all files without memory issues. thanks a

lot for the suggestion and the help! cheers, dret.

On 2017-01-24 16:04, Saxonica Developer Community wrote:

Please register to edit this issue

Also available in: Atom PDF