Memory consumption with parsing many files

Added by Anonymous over 18 years ago

Legacy ID: #3975809 Legacy Poster: snurry123 (snurry123)

Hello! I tried converting 3000 xml documents to csv using xslt and saxon. 10% of the documents are 50Mb in size - the rest is 10Kb. I am running out of memory on the machine, although I have 2 Gig of Memory. I used HAT to analyse the heap: Most Objects are of class 13555 instances of class "net.sf.saxon.tinytree.TinyElementImpl" which I suppose do not get garbage-collected. Am I doing something wrong with calling the saxon parser? I loop over a list of files in my application and call the following method: Any help would be greatly appreciated. StreamResult sr_CE = null; NodeInfo doc = null; try { doc = new XPathEvaluator().setSource(new StreamSource(new File(getXmlfile()).toURL().toString())); } catch (XPathException e1) { log.error(e1.toString()); } catch (MalformedURLException e1) { log.error(e1.toString()); } TinyBuilder builder = new TinyBuilder(); try { sr_PI = new StreamResult(new BufferedWriter(new FileWriter(CSVDirectory + "/output.csv",true))); trans_PI.transform(doc, builder); try { sr_PI.getWriter().close(); sr_PI = null; } catch (IOException e) { log.error("Error closing outputstream: " + sr_PI.getSystemId()); } } catch (Exception e) { System.out.println(e.toString()); } finally { //log.debug("Transformation of file done."); } doc = null; builder = null;

Replies (4)

Please register to reply

RE: Memory consumption with parsing many file - Added by Anonymous over 18 years ago

Legacy ID: #3975915 Legacy Poster: Michael Kay (mhkay)

From the information given, there's no obvious reasons why the documents aren't being garbage collected. But there's a lot of information you haven't given, for example why are you creating an instance of TinyBuilder and what does your trans_PI.transform() method do with it? 13555 instances of class "net.sf.saxon.tinytree.TinyElementImpl" does not actually seem all that many, but if they are all in different documetns then they will lock down the whole document which would obviously be fatal. It would be useful to try to identify from the HAT analysis what objects are holding the references to them.

RE: Memory consumption with parsing many files - Added by Anonymous over 18 years ago

Legacy ID: #3975954 Legacy Poster: snurry123 (snurry123)

Thank you for your reply. The trans_PI.transform() method is a call to a Transformer (javax.xml.transform.templates.newtransformer) object (defined as a private class variable). I use the TinyBuilder object to tell the transformer to use the tiny tree representation. I use CachedXSLT templates and generate the transformers from them. Am I doing something wrong here? System.setProperty("javax.xml.transform.TransformerFactory","net.sf.saxon.TransformerFactoryImpl"); tfactory = TransformerFactory.newInstance(); Source xsltSource_PI = new StreamSource(XSLTDirectory + "PeriodInfo.xslt"); Templates cachedXSLT_PI = tfactory.newTemplates(xsltSource_PI); trans_PI = cachedXSLT_PI.newTransformer(); The root object of the reference tree is of type net.sf.saxon.expr.XPathContextMajor. In the NodeKeys-Array (ArrXPathContextMajor->CurrentIterator->NodeKeys) there are the references to the TinyElementImpl Objects. I guess this is a normal tree holding the parsed information of the tree.

RE: Memory consumption with parsing many file - Added by Anonymous over 18 years ago

Legacy ID: #3976054 Legacy Poster: Michael Kay (mhkay)

Firstly, it sounds to me from this description as if you are reusing the Transformer object for multiple transformations. You can do this, but in your situation you should [cast to net.sf.saxon.Controller and] call the method clearDocumentPool() between invocations. However, I normally advise reusing the Templates object but allocating a new Transformer for each transformation, unless you explicitly want to hang on to the resources used by one transformation when running the next. I'm not sure how relevant the XPathContextMajor information is. This suggests that at the time of the snapshot you were sorting a list of 13,555 elements. There's nothing untoward about that.

RE: Memory consumption with parsing many files - Added by Anonymous over 18 years ago

Legacy ID: #3976092 Legacy Poster: snurry123 (snurry123)

Perfect! I inserted the call of clearDocumentPool() and the application runs stable with 800MB memory usage. Thank you so much!

(1-4/4)

Please register to reply

Project

Profile

Help

Saxon