Suggestion of a document stability
Added by Vladimir Nesterovsky over 7 years ago
As far as I know, Saxon today stores documents and texts in memory to support deterministic behavior.
I would like to suggest you to introduce an API to control (at least internally) this behavior. E.g. I can see three more alternatives to implement document stability:
- create a temp folder during the transformation, and mirror all documents there to keep them in original state.
- use single temporary memory mapped file during the transformation to save all referred documents there.
- use key-value storage API like berkeley db, SQLite, or similar.
If you would keep in such a store documents in parsed form then you could completely eliminate memory footprint limits, and could potentially break through the limitations of current xslt streaming.
Replies (2)
RE: Suggestion of a document stability - Added by Michael Kay over 7 years ago
Thanks for the suggestions and sorry for not replying sooner.
I'm a bit reluctant to introduce extra complexity here: I'm not sure how many users would benefit, and it's the kind of feature that inevitably adds a problem for every problem that it solves; but I can see that some use cases would benefit from better support here.
I have wondered in the past whether the cheapest way to support deterministic behaviour of the collection() function might be to rename the input files to a name that only Saxon knows, as a way of effectively locking the content from concurrent modification (and then renaming them back on completion). But I don't think users would appreciate finding that after a crashed transformation all their input files have disappeared. I think in practical use cases determinism of the collection() function is very rarely needed, and the current solution of allowing non-deterministic access meets most practical needs.
RE: Suggestion of a document stability - Added by Vladimir Nesterovsky over 7 years ago
Initially I was thinking about caching unparsed inputs but then my ideas slipped in direction of spooled implementations of nodeinfo in index structures. This is pluggable and can give memory/performance compromise.
Please register to reply