Project

Profile

Help

Indexes in Query Processing

Added by Anonymous about 17 years ago

Legacy ID: #4281530 Legacy Poster: Sudarshan Murthy (smurthy)

Hello, What kind of indexes, if any, does Saxon (either the free or the commercial version) use when executing queries? Is it possible to configure Saxon to build an index? Could it be asked to persist the index? If Saxon can/does build indexes, are these structural indexes, or are they value indexes? Thanks, Sudarshan Murthy.


Replies (3)

Please register to reply

RE: Indexes in Query Processing - Added by Anonymous about 17 years ago

Legacy ID: #4281610 Legacy Poster: Michael Kay (mhkay)

  1. Saxon always builds an ID index for every document to support the id() function, whether you use it or not. 2. Saxon builds an IDREF index to support the idref() function the first time you use idref() on a given document tree. 3. In XSLT, Saxon builds an index to support xsl:key and the key() function the first time you use the key() function for that key on a given document tree. This index is reused if you use the same stylesheet repeatedly on the same source document. 4. The first time you use //x on a given document tree, Saxon keeps the result, linked to the document node. So effectively there is an index of elements in the document by name, but it is built incrementally one name at a time. This is used only to support expressions starting //x, it isn't used for example if you write //y/x. 5. In Saxon-SA, any filter expression such as /a/b/c[@x=$y] is in effect translated into a key definition with match="/a/b/c" use="@x", together with a call on key('k', $y) - that is, a value index. There's of course a complex set of rules determining exactly when this is possible. The important thing is that the expression being filtered must be a path expression that can be statically determined to be rooted at a document node. This index is associated with the document node, and lasts as long as the document. 6. In Saxon-SA, a filter expression that isn't rooted at a document node, for example $abc[@x=$y], will result in creation of a temporary index for the variable $abc provided that the expression occurs in a loop relative to the creation of $abc - that is, if it appears the index will be used more than once. Such an index lasts only as long as the variable. (In many cases the variable won't be a user-written variable, but a system-created variable caused by extracting an expression from a loop). This often arises as a result of evaluating a join. 7. There's also a temporary index or hash-table used to support general comparisons A=B if both A and B are multi-valued. Apart from the //x case, Saxon doesn't use structure indexes. Instead it relies on the fact that searching for elements with a known name in the TinyTree structure is extremely fast. (One researcher doing benchmarks concluded from his measurements that Saxon must be using indexes, but he was wrong.)

RE: Indexes in Query Processing - Added by Anonymous about 17 years ago

Legacy ID: #4286121 Legacy Poster: Sudarshan Murthy (smurthy)

Thanks for the response. I have a few follow-up questions: * Because you did not mention saving indexes for later use and reading saved indexes, I assume Saxon does not support these operations. Am I correct? * Does Saxon exploit schema to optimize query execution? I have read your article on "Schema-aware Queries and Stylesheets" (http://www.stylusstudio.com/schema_aware.html). That article talks about using schema for validation, but not for optimization (for example, as an "index" for what elements might not be down a path). Do you know any query processor that uses schema for query optimization? Thanks, Sudarshan.

RE: Indexes in Query Processing - Added by Anonymous about 17 years ago

Legacy ID: #4286181 Legacy Poster: Michael Kay (mhkay)

>saving indexes for later use No, Saxon never saves indexes in persistent storage (not even with the PTree, where it would be quite possible in theory). Too many difficulties determining whether the document and/or the stylesheet has changes since the index was written - once you're into that, you've started building a database product. >Does Saxon exploit schema to optimize query execution Yes for inferring atomic types and cardinality, e.g. book/title will stop searching after the first title (and if used where a singleton is required, there's no run-time check generated because the cardinality is known statically). Knowing types at compile time can also enable many rewrites that wouldn't otherwise be safe. But schema information isn't currently used to reduce the search space for a path expression. Michael Kay

    (1-3/3)

    Please register to reply