Project

Profile

Help

Suggestion of a document stability

Added by Vladimir Nesterovsky almost 7 years ago

As far as I know, Saxon today stores documents and texts in memory to support deterministic behavior.

I would like to suggest you to introduce an API to control (at least internally) this behavior. E.g. I can see three more alternatives to implement document stability:

  1. create a temp folder during the transformation, and mirror all documents there to keep them in original state.
  2. use single temporary memory mapped file during the transformation to save all referred documents there.
  3. use key-value storage API like berkeley db, SQLite, or similar.

If you would keep in such a store documents in parsed form then you could completely eliminate memory footprint limits, and could potentially break through the limitations of current xslt streaming.


Replies (2)

RE: Suggestion of a document stability - Added by Michael Kay over 6 years ago

Thanks for the suggestions and sorry for not replying sooner.

I'm a bit reluctant to introduce extra complexity here: I'm not sure how many users would benefit, and it's the kind of feature that inevitably adds a problem for every problem that it solves; but I can see that some use cases would benefit from better support here.

I have wondered in the past whether the cheapest way to support deterministic behaviour of the collection() function might be to rename the input files to a name that only Saxon knows, as a way of effectively locking the content from concurrent modification (and then renaming them back on completion). But I don't think users would appreciate finding that after a crashed transformation all their input files have disappeared. I think in practical use cases determinism of the collection() function is very rarely needed, and the current solution of allowing non-deterministic access meets most practical needs.

RE: Suggestion of a document stability - Added by Vladimir Nesterovsky over 6 years ago

Initially I was thinking about caching unparsed inputs but then my ideas slipped in direction of spooled implementations of nodeinfo in index structures. This is pluggable and can give memory/performance compromise.

    (1-2/2)

    Please register to reply