Maps, arrays and others

Added by Vladimir Nesterovsky over 2 years ago

Hello,

While looking at the issue 5699 I decided to inspect my codebase and observed that in most cases there is no access to a map or an array values after update.

This means that in most cases engine internally can use mutable containers to implement those immutable items.

In particular I have found following code patterns allowing use of mutable implementation:

Create map, and then read out of it.
Create and populate iteratively, e.g. with xsl:iterate. Map updated in current iteration is never used anymore. New map is passed into next iteration.
Create a map or an array and pass it into a function or in a template without next use.
Return a map or an array not derived from static parameter or variable from a function or a template.
Receive map or array as parameter and use for read only.
Small map or array, for which copy is justified.

I think it is possible to implement a logic at compile that tracks last access point to item to understand whether it is not used any more, so no copy or immutable structure is required.

In addition it might be needed a "shared" indicator for item implementation to track whether mutable structure can be used, e.g. for parameter passed into a function.

I think this technique can be used even to regular sequences, allowing to reduce copies or some other lazy techniques.

Replies (2)

RE: Maps, arrays and others - Added by Michael Kay over 2 years ago

You make some good points. It would certainly be possible in principle to get some optimisations by doing data flow analysis or escape analysis to see whether an update can be made "in situ" because the old value is never going to be needed. The main difficulty is it would need very thorough testing. And I'm not sure how big the benefits would be: using functional ("persistent") data structures for maps and arrays doesn't impose a very large overhead, and we only use them sparingly. Our general strategy with maps and arrays is that when they're first created, we use a data structure that assumes they are never going to be "updated", and if they are updated, we copy the data into a persistent structure on the assumption that there are likely to be further updates. So we have two implementations - a read-only implementation, and a persistent implementation, and I think that works reasonably well.

For maps we also have a custom read-only implementation for the "json" situation (also used for option maps on calls to system functions such as fn:serialize) where all the keys are xs:string values - this allows us to avoid retaining the type annotation with each key value. While such a map is under construction we do use in-situ modification. In the uncommon scenario that a map created using parse-json() is updated after initial construction is complete, we copy the map to a persistent HashTrie.

I'm wary of doing any more complex analysis at compile time because there are many workloads where compile time performance dominates over run time performance. If we did anything like this, it would have to be as part of a move towards more just-in-time optimisation based on execution metrics.

At one time we did have a simple scheme for doing in-situ updates to a map while it was under construction, for example within an xsl:map instruction.

RE: Maps, arrays and others - Added by Michael Kay over 2 years ago

More generally, we are starting to make more optimisation decisions based on observing and learning from run-time behaviour rather than doing compile-time analysis. So, for example, if we observe that a variable is always read to completion, then we know that incremental evaluation is giving no benefits, so there's no point in doing it. We could use similar ideas here: choose the initial map implementation for a map-creation instruction based on past history of what happened to the map after the initial construction, I think that's more likely to be informative than compile-time data flow analysis.

(1-2/2)

Please register to reply

Project

Profile

Help

Saxon