Support #3100
closedProcessing many files
0%
Description
As mentioned in an email conversation, I am trying to process many files (~2000), all of them moderate in size (1-2MB), but it seems like I am no using discard-document correctly. I always get Java heap space errors, so I am assuming that I must be doing something wrong. My starting point was the code shown on http://www.saxonica.com/html/documentation/functions/saxon/discard-document.html, but maybe I am doing something wrong in the way how I use it.
Files
Updated by Michael Kay almost 8 years ago
Because you load the files in the form of a collection, and the collection is in a global variable, there is a link to the documents for the duration of the transformation even though they have been discarded from the document pool.
saxon:discard-document() is designed for documents read using the document() or doc() function rather than using collection(). Your best approach might be to use the uri-collection() function to get a set of URIs, and then to fetch individual documents using doc() based on those URIs.
But you may be able to get away with simply dropping the global variable and doing
xsl:for-each select="collection(...)!discard-document(.)"
Note that 9.7 introduced significant changes to the way collections work internally -- I'm not sure which release you are on. The new CollectionFinder interface in 9.7 opens the door to a lot of Java API capability for controlling how the resources used by collections are managed.
Updated by Erik Wilde almost 8 years ago
thanks for the suggestions! i have now used uri-collection() and then
open the files individually via doc(). that seems to work well, and i
was able to open and process all files without memory issues. thanks a
lot for the suggestion and the help! cheers, dret.
On 2017-01-24 16:04, Saxonica Developer Community wrote:
Please register to edit this issue