Project

Profile

Help

discard-document and key function

Added by Anonymous almost 17 years ago

Legacy ID: #4760399 Legacy Poster: Vladimir Nesterovsky (vnesterovsky)

Hello! Problem: out of memory. Description: I'm processing many files in my xslt. To release processed documents I call discard-document function. To access elements in the document I use key() function. Symptoms: In the task manager I can see increasing memory usage until out of memory. Whenever I stop using key() function problem disappears. Question: Is there any solution for my problem? P.S.: I've looked into saxon 9 sources. It seems that document are only refered through weak hash map, whenever key() is used. I've found one strange thing. The KeyManager.docIndexes object stores calculated index tables as WeakReference: WeakReference indexRef = (WeakReference)docIndexes.get(doc); It seems to me that Java can free these WeakReference at any time. This means that these indexes may need to be rebuilt at any rate. Thanks. -- Vladimir


Replies (13)

Please register to reply

RE: discard-document and key function - Added by Anonymous almost 17 years ago

Legacy ID: #4760484 Legacy Poster: Michael Kay (mhkay)

I'll have to investigate this. The general idea is an index remains in memory so long as both the stylesheet defining the key (the Executable) and the document being indexed remain in memory. A document loaded using document() normally remains in memory for the duration of the transformation, but is released earlier when discard-document() is called. It seems that the document is being locked down by virtue of the link from the KeyManager, but I'm not quite sure why this should happen as the link involves a WeakReference.

RE: discard-document and key function - Added by Anonymous almost 17 years ago

Legacy ID: #4760522 Legacy Poster: Vladimir Nesterovsky (vnesterovsky)

Looking at the code I do not understand what prevents indexRef to be reclaimed on first opportunity (during gc). This could leed to excessive key rebuilding. WeakReference indexRef = (WeakReference)docIndexes.get(doc); All this in contrast with what I see at runtime: documents are kept, indexRef(s) are kept. I've run profiler, it shows that documents are refered by: net.sf.saxon.tinytree.TinyTree@0x5000c65a (113 bytes) : field documentList net.sf.saxon.trans.KeyManager@0x50001006 (16 bytes) : field docIndexes

RE: discard-document and key function - Added by Anonymous almost 17 years ago

Legacy ID: #4760534 Legacy Poster: Michael Kay (mhkay)

The reason that an index is always held in memory at least for the duration of a transformation (and that it also holds the relevant document in memory) would appear to be the code in method putIndex commented "ensure there is a firm reference to the indexList for the duration of a transformation", which causes a link to be maintained from the Controller. It seems that discard-document() should cause this link from the Controller to be broken. Try adding to the code of Extensions.discardDocument(): c.setUserData(doc, "key-index-list", null); However, there's a danger then, I think, that the index will disappear while the document is still in use.

RE: discard-document and key function - Added by Anonymous almost 17 years ago

Legacy ID: #4760575 Legacy Poster: David Lee (daldei)

I've used these Weak References before and have learned that there can be some issues to keep aware of. As mr Kay mentions, you may need to clear the reference to doc. c.setUserData(doc, "key-index-list", null); IMHO, I dont think this is dangerous whatsoever. The key will not go away until ALL references to the doc are gone. So if its still actually in use anywhere, i.e. there are ANY references to the doc object in any live objects the key wont go away. Furthermore, I have found in practice that is not always enough. Usually, yes, but I've found that sometimes you can create complex circular references in java which prevent the gc from clearing all references and properly freeing memory. I have never extracted a simple case to prove this, but in real applications I have demonstrated it. The solution is that sometimes you need to manually break a circular chain. E.g. Hypothetical example: A -> B -> C -> D -> E -> A when you stop referencing A, supposedly A,B,C,D are all supposed to be cleared and gc will work on them. There are cases I've found where this doesnt always happen, and I solved it by clearing some of the links explicitly (i.e. set E.A = null ). Its possible I was mistaken and this wasnt a true 'bug' ... and that the gc would have eventually resolved the circular references, but in my case it solved the "out of memory" problem.

RE: discard-document and key function - Added by Anonymous almost 17 years ago

Legacy ID: #4760607 Legacy Poster: Vladimir Nesterovsky (vnesterovsky)

I've added <xsl:sequence xmlns:c="java:net.sf.saxon.Controller" select="c:set-user-data(saxon:get-controller(), $doc, 'key-index-list', ())"/> This did the TRICK! I think it worth to add this logic in discard-document and add disclaimer. It's better than allow memory overflow. Thanks a lot!

RE: discard-document and key function - Added by Anonymous almost 17 years ago

Legacy ID: #4791377 Legacy Poster: Vladimir Nesterovsky (vnesterovsky)

Am I correct in that the document defined in local variable and referenced by the key() function is kept for the duration of a transformation. <xsl:function name="t:x"> ... <xsl:variable name="doc"> <doc> </xsl:variable> ... <xsl:if select="key('a', '1', $doc/doc)"> ... </xsl:if> ... </xsl:function>

RE: discard-document and key function - Added by Anonymous almost 17 years ago

Legacy ID: #4794761 Legacy Poster: Michael Kay (mhkay)

Yes, that was the conclusion reached earlier in this thread.

RE: discard-document and key function - Added by Anonymous almost 17 years ago

Legacy ID: #4795850 Legacy Poster: Vladimir Nesterovsky (vnesterovsky)

I'm sorry I'm elevating my problem again. This is because it's a sensitive point in my project, and by chance someone's else also. Is it possible to change userDataTable in net.sf.saxon.Controller into weak hash map storing (key, HashMap) pairs. Where value HashMap will store (name, data) pairs? public void setUserData(Object key, String name, Object data) { // System.err.println("setUserData " + name + " on object to " + data); String keyVal = key.hashCode() + " " + name; if (data==null) { userDataTable.remove(keyVal); } else { userDataTable.put(keyVal, data); } } Thanks.

RE: discard-document and key function - Added by Anonymous almost 17 years ago

Legacy ID: #4795914 Legacy Poster: Vladimir Nesterovsky (vnesterovsky)

Discard please the previous message. It won't help.

RE: discard-document and key function - Added by Anonymous over 16 years ago

Legacy ID: #5297873 Legacy Poster: Vladimir Nesterovsky (vnesterovsky)

Sorry that I'm raising this question again. This is due to increasing complexity of managing keys. In my applications I create many intermediate trees in xsl:variable, and use key() function, which leads to leaks. And I have to use techniques described in this thread to manage this. This makes the code less portable, and ugly. It it possible to consider the following: a) extend DocumentInfo (probably through DocumentInfo2?) with a "document-data" - map of id to some data. b) for each index to define application wide unique id: number, string, qname, ... c) use "document-data" rather than controller's "user-data" to store indices. As result index data will be collected automatically along with document as: document refers to document-data refers to index-id: index. At present: controller's user-data stores index; index refers to a document (even temporary document); document-cache refers to a document (but not temporary one). Thanks.

RE: discard-document and key function - Added by Anonymous over 16 years ago

Legacy ID: #5298324 Legacy Poster: Michael Kay (mhkay)

(I wish I could work out how this forum decides to display the structure of a thread. Seems completely chaotic to me...) I think the current strategy is correct, and carefully thought out, in respect of documents read using the doc() (or document()) function. It's designed to ensure that an index remains in memory and usable so long as both the document and the compiled stylesheet are in memory. This is useful where there are many concurrent transformations using the same stylesheet and the same lookup document. Since the code is quite delicate and difficult to test, I'm reluctant to make any changes that will impact this use case. You've raised two cases that the strategy doesn't handle so well: documents that are discarded using discard-document(), and transient documents (temporary trees held in a variable). For these cases it would make more sense for the index to be referenced via the document node, so it disappears when the document disappears - rather as you describe. This would also have the benefit that the operation of index construction is no longer synchronized. (Looking at the code here, I think there's actually a bug: for the use case above with two transformations using the same lookup document concurrently, I think one of them can fail saying the index is under construction.) Ideally I'd like to do this in such a way that the same indexing code is used both for explicit keys and for local keys introduced by the Saxon-SA optimizer. That's possibly a bigger challenge. I have to say that while there are clearly opportunities here, I have some doubts about "return on investment". I think your use case may be a little unusual.

RE: discard-document and key function - Added by Anonymous over 16 years ago

Legacy ID: #5300229 Legacy Poster: Vladimir Nesterovsky (vnesterovsky)

> It's designed to ensure that > an index remains in memory > and usable so long as both > the document and the compiled > stylesheet are in memory. It will, until controller's document cache refers the document. On the other hand ids used to support id() function are already stored in the DocumentInfo. > I think your use case > may be a little unusual. I have to use key() over temporary tree because I'm performing a multistage processing. On a higher level the process is controlled by the driver xslt, which calls transformation for a multiple input documents. No extension functions are used except saxon:discard-document() for external documents, and tricks to release keys. Not very unusual I think.

RE: discard-document and key function - Added by Anonymous about 16 years ago

Legacy ID: #5387651 Legacy Poster: Michael Kay (mhkay)

The multithreading problem that I spotted when reading the code was confirmed by testing, and has now been fixed - see bug 2152858

    (1-13/13)

    Please register to reply