Project

Profile

Help

Feature #4059

closed

Simple maps with string keys

Added by Michael Kay over 5 years ago. Updated about 5 years ago.

Status:
Closed
Priority:
Low
Assignee:
Category:
-
Sprint/Milestone:
-
Start date:
2018-12-02
Due date:
% Done:

100%

Estimated time:
Legacy ID:
Applies to branch:
9.9, trunk
Fix Committed on Branch:
9.9, trunk
Fixed in Maintenance Release:
Platforms:

Description

Studying the performance of the XSLT-in-XSLT compiler (nearly complete but not yet released) I think there is an opportunity for an improved map implementation that will deliver better performance in many cases. In particular (a) where the keys are all strings, (b) where the map is constructed "in one go", and (c) where calls to put() and remove() are unlikely to happen.

In this case we could implement the map as a simple Java HashMap<String, GroundedValue> (and on Javascript, simply as an object). In the event that put() or remove() is called, the structure could be bulk-copied to the existing HashTrieMap implementation.

We could use this structure:

(a) if requested by a call on a new extension function, say saxon:dictionary(keys, values)

(b) for a map{} literal if the keys are all strings

(c) for a constant map loaded from a SEF file (this case is common in the XSLT-in-XSLT compiler.

(d) for maps constructed using parse-json().

With a bit of work in the optimizer we might also detect some cases where the map is constructed using xsl:map or map:merge(), provided static analysis of the operand is possible.

On the Javascript side there are probably even greater savings by using a Javascript "object" directly, rather than our HashTrieMap implementation.

Actions #1

Updated by Michael Kay over 5 years ago

I have created an implementation of MapItem called Dictionary; it simply wraps a HashMap<String, GroundedValue>; if put() or remove() are called it is bulk-converted to a HashTrieMap.

Constructing a simple 1m-entry map and then calling map:size() is measured at 1127ms with the conventional structure.

If we replace the call to map:size() with a single call to map:get() the cost comes down to 776ms.

Re-implementing map:entry() to use a new SingletonMap implementation, rather than the general-purpose HashTrieMap(), brings these numbers down to 1003ms and 691ms respectively.

Creating an equivalent instance of the new Dictionary structure (using an extension function that takes two arguments, a sequence of keys and a sequence of values) takes 132ms if followed by a call on size(), 135ms if followed by a call on get().

Retrieval: making 1m calls on map:get on a 1000-entry map takes 73ms with the conventional structure, 57ms with the new.

So there's a clear gain using the new structure where possible. (And in retaining the new implementation of singleton maps.)

I'm experimenting with various designs of extension function for creating an instance of the structure. It can also be used internally as the result of parsing JSON.

Actions #2

Updated by Michael Kay over 5 years ago

One possible API would be to add two implementation defined options to the options for map:merge:

  • key-type="string" constrains the keys in the map to be strings (error if not)
  • final="true()" optimises the map for the situation where no put() or remove() calls will occur (if they do, they are allowed but will be inefficient)

If both the above options are present, Saxon uses the Dictionary implementation method.

This design has the advantage of embracing other optimisations in the future.

Actions #3

Updated by Michael Kay over 5 years ago

I've implemented this.

Needs documenting.

We should put an extension attribute on the xsl:map instruction to achieve a similar effect.

The PackageLoader should be given a clue about how to reconstruct an exported map, and should also be able to reconstruct the options on map:merge.

Actions #4

Updated by Michael Kay over 5 years ago

  • Status changed from New to In Progress

Code committed. The new map implementation is used internally in quite a variety of places so it can be considered reasonably well tested.

Leaving open for additions e.g. for xsl:key.

Actions #5

Updated by Michael Kay about 5 years ago

  • Status changed from In Progress to Resolved
  • Applies to branch 9.9, trunk added
  • Fix Committed on Branch 9.9, trunk added

This has been Implemented on 9.9 and on the trunk, under the name DictionaryMap

Actions #6

Updated by Debbie Lockett about 5 years ago

9.9.1 release included the changes for map:merge(), as documented in changes-9.9.1. The documentation has (belatedly) now been updated, to also include details under functions/map/merge.

Committed on 9.9 and 10.0 dev branches. 9.9 documentation updated online (XML and HTML versions).

Actions #7

Updated by O'Neil Delpratt about 5 years ago

  • Status changed from Resolved to Closed
  • % Done changed from 0 to 100
  • Fixed in Maintenance Release 9.9.1.2 added

Bug issue fixed in the Saxon 9.9.1.2 maintenance release.

Please register to edit this issue

Also available in: Atom PDF