Support #6322
openIs the old collation URI in EE supposed to rely on the JDK collation support or on ICU?
0%
Description
I notice differences in collation dependent sorts whether I use the UCA URI or the legacy Saxon URI.
The documentation at https://www.saxonica.com/html/documentation12/localization/sorting-and-collations.html says about collations
For backwards compatibility reasons the standard collation resolver in Saxon also accepts URIs in the form http://saxon.sf.net/collation followed by query parameters; the query parameters that are recognized are the same as those defined by W3C UCA collation URIs.
It appears to me, however, that with EE, the use of the legacy collation URI means ICU is not used, only the default JDK collation support, as I get results similar to using HE.
Can anyone confirm that?
Details below:
An XQuery program using the UCA collation URI, run with Saxon 12.4 EE, gives the same result for all sorts, e.g. the following program outputs true:
declare namespace array = "http://www.w3.org/2005/xpath-functions/array";
declare namespace output = "http://www.w3.org/2010/xslt-xquery-serialization";
declare option output:method 'text';
declare option output:item-separator ' ';
declare variable $strings as array(xs:string*) external := [
('abc', 'abc def', 'abcdef'),
('abc', 'abcdef', 'abc def'),
('abc def', 'abcdef', 'abc'),
('abc def', 'abc', 'abcdef'),
('abcdef', 'abc', 'abc def'),
('abcdef', 'abc def', 'abc')
];
let $sorted :=
array:for-each(
$strings,
function($seq) {
sort($seq, 'http://www.w3.org/2013/collation/UCA?strength=primary;lang=en')
})
return
every $pos in (1 to array:size($sorted))
satisfies
deep-equal($sorted($pos), $strings?1)
The same program using the legacy Saxon collation URI, however, gives different sort result, much like Saxon HE, e.g. the program outputs false:
declare namespace array = "http://www.w3.org/2005/xpath-functions/array";
declare namespace output = "http://www.w3.org/2010/xslt-xquery-serialization";
declare option output:method 'text';
declare option output:item-separator ' ';
declare variable $strings as array(xs:string*) external := [
('abc', 'abc def', 'abcdef'),
('abc', 'abcdef', 'abc def'),
('abc def', 'abcdef', 'abc'),
('abc def', 'abc', 'abcdef'),
('abcdef', 'abc', 'abc def'),
('abcdef', 'abc def', 'abc')
];
let $sorted :=
array:for-each(
$strings,
function($seq) {
sort($seq, 'http://saxon.sf.net/collation?strength=primary;lang=en')
})
return
every $pos in (1 to array:size($sorted))
satisfies
deep-equal($sorted($pos), $strings?1)
Updated by Michael Kay 11 months ago
Looking at the code, the legacy Saxon collation URIs give you a Java RuleBaseCollator
rather than an ICU collator.
Updated by Martin Honnen 11 months ago
Thanks, you might want to document that, I think Ken Holmann on Slack has been trying for a day or so to get some "good" collation based sorting from PE but I think his attempt all failed as he used the legacy Saxon collation URI and the Java JDK based sorting doesn't seem to do what he wants.
Please register to edit this issue