The default collation is ignored for value comparisons and general comparisons. The documentation doesn't make this clear.
#1 Updated by Debbie Lockett over 2 years ago
- Status changed from New to In Progress
- Found in version set to 1.0.0
The Saxon-JS code for handling collations has had a lot of work since the 1.0.0 release. This includes addressing the problems raised in this bug:
The "comparer" objects (which have compare and equals methods) used for value comparisons and general comparisons now check for a specified collation, using the codepoint collation as default.
The codepointCollation.equals() method has been updated as suggested.
Also, "collation" objects now have a standard format, and can be supplied to the SaxonJS.transform() call using the "collations" option. This takes a map (JS object) from collation URIs to collations. A collation is an object with certain methods (where the arguments are JS strings): equals (mandatory), compare, collationKey, contains, startsWith, endsWith, and indexOf.
The following collations are now implemented in Saxon-JS: unicode codepoint collation (http://www.w3.org/2005/xpath-functions/collation/codepoint), and the HTML ASCII case-insensitive collation (http://www.w3.org/2005/xpath-functions/collation/html-ascii-case-insensitive).
The use of Unicode Collation Algorithm collations (those beginning http://www.w3.org/2013/collation/UCA) is not yet implemented. Currently if one of these collations is specified, then Saxon-JS always just uses the default codepoint collation.
The documentation needs to be updated with the above information for the next release.
#2 Updated by Michael Kay over 2 years ago
For UCA collations (and when lang="XX" is specified in xsl:sort), we should attempt to use Intl.Collator to the extent it is available in the browser.
Query parameters in the UCA collation URI should be handled as follows:
fallback - if fallback=no, reject as "unsupported collation"
lang - use as first argument to Intl.Collator
strength - use to set the sensitivity option
caseFirst - use to set the kf option
numeric - use to set the kn option
alternate - interpret alternate=blanked as ignore-punctuation=true.
The resulting Collator object supports compare() (and therefore equals()), but not contains, startsWith, etc; and there's no mechanism for getting collation keys so it can't be used in distinct-values() and grouping. Unless perhaps we implement group-by by doing a sort followed by group-adjacent.
#5 Updated by Debbie Lockett 8 months ago
- Description updated (diff)
- Fix Committed on JS Branch Trunk added
Committed changes on the 2.0 trunk branch to implement support for UCA collations as suggested above (including when lang="XX" is specified in
xsl:sort, see change in
When the only query parameter used in a UCA collation URI is "strength=secondary" (or "strength=2"), then a full caseblind collation is used (
Compare.caseblind), to support contains, startsWith, etc. as well as equals and compare.
Otherwise, the collation object uses
Intl.Collator, and so only supports equals and compare (so has restricted use as described above).
Details for handling strength parameter:
- primary|1 => sensitivity: "base"
- secondary|2 => sensitivity: "accent"
- tertiary|3 => sensitivity: "variant"
- quaternary|4|identical|5 => sensitivity: "variant", ignorePunctuation: "false"
The following query parameters are ignored: version, maxVariable, backwards, normalization, caseLevel, reorder; and alternate is only supported if alternate=blanked.
Please register to edit this issue