Project

Profile

Help

Lazy-read Source for XPATH, XQUERY

Added by Anonymous almost 16 years ago

Legacy ID: #6066473 Legacy Poster: David Lee (daldei)

I've dug into the code and I think my answer is "no". But I could have missed something obvious. Ideally using S9 but could use the lower layer, Is there a way to run an XPath or XQuery that doesnt read anything form the context unless its needed ? Say for an example of xpath or xquery "'foo'" Is there a way to set the context (say an InputStream) such that it is never read in this case. Only read when some part of the expression requires access to the context. Another way to tackle this problem ... Is there a way to 'pre parse' , prior to executing , an xpath or xquery expression to gain knowledge of whether it refrences the context. That way I could know if I had to set the context or not. The area this problem crops up is in implementing XPROC's "with-option" tag. This is an XPath expression that can reference the standard input stream, but in practice hardly ever does. In the general case though, I am stuck having to fork/duplicate the input data just to pass it to the with-option tag on the off chance it might read it. If there was a way of telling Saxon to defer building the document from the source until it is actually needed this could be solved by using a special source that copies only on demand. Any suggestions welcome.


Replies (3)

Please register to reply

RE: Lazy-read Source for XPATH, XQUERY - Added by Anonymous almost 16 years ago

Legacy ID: #6072130 Legacy Poster: Michael Kay (mhkay)

Can't think of any obvious way to set a "lazy context item" other than writing your own implementation of NodeInfo that materializes the node on first access. Possible, but not particularly easy. Determining whether an XPath expression access the context is fairly easy (via getStaticProperties()). XQuery has a method XQueryExpression.usesContextItem() that should do the trick; for XSLT it's a bit harder because the dependency could be in any global variable. You might have to call explain() supplying your own listener that analyzes the explain output.

RE: Lazy-read Source for XPATH, XQUERY - Added by Anonymous almost 16 years ago

Legacy ID: #6124827 Legacy Poster: Michael Kay (mhkay)

Given this remark: >I am stuck having to fork/duplicate the input data just to pass it to the with-option tag on the off chance it might read it. What form does the "input data" actually take? Why do you need to duplicate it, rather than passing a reference?

RE: Lazy-read Source for XPATH, XQUERY - Added by Anonymous almost 16 years ago

Legacy ID: #6125519 Legacy Poster: David Lee (daldei)

Very good question. And maybe I'm just doing things "wrong". But here's the scoop. Input data originates and ends up as a text stream. I do want to change this to be something else, but for now its text. The reason I started with text is that not all commands are pure XML commands so I cant arbitrarily decide that all data is XML and parse it to a document. Now in this particular case where I'm implementing xproc as a sub component of xmlsh I can in fact assume the input is XML. So conceptually I could parse the input up front in the generated script. I'm reluctant to do that for 2 reasons. For context, I am taking xproc input and translating it to an xmlsh script. For one, it makes the translated script inelegant. This isn't a huge point because in theory the end user doesn't see the translated script, but I would really like to make use of the shell syntax for pipes at the script level instead of reading the input into a variable and passing the variable around. Secondly, I really want to avoid hard-coding in things that absolutely make it impossible to implement in a streaming process. I know that's probably improbably in the long run because almost every single XML technology to date doesn't actually stream, and theoretically many XML operations cant eve rstream, but maybe over time that will changes. Saxon already has some made some headways into this! But if I read the input into a document at the beginning I'll have admitted defeat and I can guarantee nothing will ever stream. Perhaps my goals are overly high .. but I do have a dream that the shell can someday implement a pipeline like a | b | c without forcing a full copy of the data and building a complete document at each | Right now I'm considering hiding the "forkng" inside the shell so that the the syntax can still be pipe based, but under-the-hood I can implement forking or copying of data, or as you mention creating a reference and passing it around. But As I mention in my first point, its a bit tricky as input data is not universally XML. I'm thinking of a technique of both sides of the pipe "negotiate" for which representation of the data they want, that way I can avoid having to translate formats in some cases (say from SAX to DOM) and also allow multiple readers to get a reference instead of a copy in some cases, such as if 2 readers ask for the same format, and that format has already been cached. I actually just wrote a blog entry on this yesterday. http://blog.xmlsh.org/ Anyway I apologize if this is not at all Saxon specific ! but you did ask :) -David

    (1-3/3)

    Please register to reply