Support #4992
closedxsl:sort using @lang locales
0%
Description
Hello,
I have a question regarding the sort support in Saxon 9.9 and Saxon 10. One of Oxygen XML Editor users is trying to sort German and French terms according to locale sort rules (de-DE, de-AT, fr-FR, fr-CA), but it seems that only responds to main/basic language rules (de-AT = de = de-DE and fr-CA = fr = fr-FR). In the attached input file you can see the desired result for de-AT and fr-CA in the comments.
Files
Updated by Michael Kay over 3 years ago
If the lang
attribute is present on xsl:sort
, and if no other relevant attributes (such as collation
) are present, then Saxon will obtain a Java Locale using the logic in JavaCollationFactory.getLocale()
(which splits the supplied lang value into language and country parts), and then gets a Java Collator using Collator.getInstance(locale)
.
The set of locales available depends on the JVM installation.
If you're concerned about accurate collation, then I'd advise setting the collation
attribute to a UCA collation URI, and making sure the ICU implementation is used (which means you need Saxon-PE or higher, and ICU must be on the classpath).
Updated by Michael Kay over 3 years ago
I've confirmed that if you use a collation attribute rather than a lang attribute, for example
collation="http://www.w3.org/2013/collation/UCA?lang=de-DE"
then it now uses ICU collations rather than Java collations. This leads to a difference between fr-FR and fr-CA, though the results for Germany and Austria are still the same.
(I've always doubted whether the traditional differences noted by collation experts still exist in the 21st century - I think collation standards nowadays are much more likely to vary from one publisher to another, rather than from one country to another. Austrians surely read the same books that Germans do, and the indexes at the back of the book aren't going to be re-sorted for the Austrian market. But I'm happy to leave that question to the ICU experts).
Perhaps in the case where xsl:sort is used with a lang attribute and no collation attribute, Saxon-PE and -EE should now be using the ICU/UCA collation rather than the Java collation.
Updated by Octavian Nadolu over 3 years ago
Thank you very much for analyzing this issue and for clarifying it.
Updated by Michael Kay over 2 years ago
- Status changed from New to Closed
Closing this because the question seems to have been answered.
Please register to edit this issue