key() performance
Added by Anonymous over 18 years ago
Legacy ID: #3677723 Legacy Poster: Guillaume VIRANTIN (virantin)
Hello, In order to improve performance of a lookup into a pretty big xml doc, I have replaced the following : <xsl:variable name="set" select="doc($file)//root"/> <xsl:variable name="buddy" select="$set//row[stlmtdate=$stlmtdate and CYN_id=$CYN_id]"/> by : <xsl:variable name="set" select="doc($file)//root"/> <xsl:key name="cynKey" match="row" use="concat(stlmtdate,CYN_id)"/> <xsl:variable name="buddy" select="doc($file)//root//key('cynKey',concat($stlmtdate,$CYN_id))"/> I wouldn't be able to say for sure if it did improve my perf, but I can say that it is still extremely slow whereas a similar approach worked extremely well for a smaller "lookup" doc. Is there a better way to proceed when performing lookups ? How much the size of the lookup doc impact the lookup perfomance ? I would understand that the index takes longer to get created but once created, I would expect the lookup to be extremely performant... Any idea ? Thank you for your help.
Replies (3)
Please register to reply
RE: key() performance - Added by Anonymous over 18 years ago
Legacy ID: #3677730 Legacy Poster: Guillaume VIRANTIN (virantin)
Actually, the perf are far worse ! NB : read the following <xsl:variable name="buddy" select="$set//key('cynKey',concat($stlmtdate,$CYN_id))"/>
RE: key() performance - Added by Anonymous over 18 years ago
Legacy ID: #3677939 Legacy Poster: Michael Kay (mhkay)
You don't say how many "root" elements there are within the document. The name suggests only one, but then why would you be using "//" rather than "/"? If there's more than one, then the queries aren't equivalent, because the first one searches only rows that are descendants of a root element, whereas the second (despite the way you've written it) searches for rows anywhere in the document. If you want to confine the search to part of the document, use select="key('cynKey', concat(...), doc($file)//root)" You seem to be using the "//" operator very casually, so that's the first thing to look at. Specifically, look at this expression: doc(x)//root//key(XXX). That's saying: search the whole document to find all the elements named root. For each such element found, locate all its descendant nodes. For each descendant node, make a call on the key() function. Then combine the results of all these calls on the key() function. Since each call actually returns the same results, you're getting exactly the same answer as if you simply did doc(x)/key(XXX), but it's taking a lot longer. In principle, the performance is O(n^2): if you double the size of the document, you'll make four times as many calls on key(), because of the two "//" operators. In practice, Saxon optimizes a //root expression starting from the document node, so it won't be quite as bad as that - but it's still bad. You might find it helpful to use the profiling tool: see http://www.saxonica.com/documentation/using-xsl/performanceanalysis.html
RE: key() performance - Added by Anonymous over 18 years ago
Legacy ID: #3678221 Legacy Poster: Guillaume VIRANTIN (virantin)
Wow ! Such an impact for a so bad habit ! Yes, there is only root element in my doc.... I have modified my cynSet variable and my lookup through the key() to use a single '/' and the lookup executes now extremely well !!! I'll defintely have a look at the profiling tool.... Thank you for your help !!
Please register to reply