Project

Profile

Help

Support #6322

open

Is the old collation URI in EE supposed to rely on the JDK collation support or on ICU?

Added by Martin Honnen 3 months ago. Updated 3 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
Documentation
Sprint/Milestone:
-
Start date:
2024-01-17
Due date:
% Done:

0%

Estimated time:
Legacy ID:
Applies to branch:
Fix Committed on Branch:
Fixed in Maintenance Release:
Platforms:

Description

I notice differences in collation dependent sorts whether I use the UCA URI or the legacy Saxon URI.

The documentation at https://www.saxonica.com/html/documentation12/localization/sorting-and-collations.html says about collations

For backwards compatibility reasons the standard collation resolver in Saxon also accepts URIs in the form http://saxon.sf.net/collation followed by query parameters; the query parameters that are recognized are the same as those defined by W3C UCA collation URIs.

It appears to me, however, that with EE, the use of the legacy collation URI means ICU is not used, only the default JDK collation support, as I get results similar to using HE.

Can anyone confirm that?

Details below:

An XQuery program using the UCA collation URI, run with Saxon 12.4 EE, gives the same result for all sorts, e.g. the following program outputs true:

declare namespace array = "http://www.w3.org/2005/xpath-functions/array";

declare namespace output = "http://www.w3.org/2010/xslt-xquery-serialization";

declare option output:method 'text';

declare option output:item-separator '
';

declare variable $strings as array(xs:string*) external := [
  ('abc', 'abc def', 'abcdef'),
  ('abc', 'abcdef', 'abc def'),
  ('abc def', 'abcdef', 'abc'),
  ('abc def', 'abc', 'abcdef'),
  ('abcdef', 'abc', 'abc def'),
  ('abcdef', 'abc def', 'abc')
];

let $sorted := 
  array:for-each(
    $strings, 
    function($seq) { 
      sort($seq, 'http://www.w3.org/2013/collation/UCA?strength=primary;lang=en')
    })
return
  every $pos in (1 to array:size($sorted))
  satisfies 
    deep-equal($sorted($pos), $strings?1)

The same program using the legacy Saxon collation URI, however, gives different sort result, much like Saxon HE, e.g. the program outputs false:

declare namespace array = "http://www.w3.org/2005/xpath-functions/array";

declare namespace output = "http://www.w3.org/2010/xslt-xquery-serialization";

declare option output:method 'text';

declare option output:item-separator '
';

declare variable $strings as array(xs:string*) external := [
  ('abc', 'abc def', 'abcdef'),
  ('abc', 'abcdef', 'abc def'),
  ('abc def', 'abcdef', 'abc'),
  ('abc def', 'abc', 'abcdef'),
  ('abcdef', 'abc', 'abc def'),
  ('abcdef', 'abc def', 'abc')
];

let $sorted := 
  array:for-each(
    $strings, 
    function($seq) { 
      sort($seq, 'http://saxon.sf.net/collation?strength=primary;lang=en')
    })
return
  every $pos in (1 to array:size($sorted))
  satisfies 
    deep-equal($sorted($pos), $strings?1)
Actions #1

Updated by Michael Kay 3 months ago

Looking at the code, the legacy Saxon collation URIs give you a Java RuleBaseCollator rather than an ICU collator.

Actions #2

Updated by Martin Honnen 3 months ago

Thanks, you might want to document that, I think Ken Holmann on Slack has been trying for a day or so to get some "good" collation based sorting from PE but I think his attempt all failed as he used the legacy Saxon collation URI and the Java JDK based sorting doesn't seem to do what he wants.

Actions #3

Updated by Martin Honnen 3 months ago

This can be closed as resolved.

Please register to edit this issue

Also available in: Atom PDF