Project

Profile

Help

Support #4992

xsl:sort using @lang locales

Added by Octavian Nadolu about 1 month ago. Updated about 1 month ago.

Status:
New
Priority:
Low
Assignee:
-
Category:
-
Sprint/Milestone:
-
Start date:
2021-05-17
Due date:
% Done:

0%

Estimated time:
Legacy ID:
Applies to branch:
10, 9.9
Fix Committed on Branch:
Fixed in Maintenance Release:

Description

Hello,

I have a question regarding the sort support in Saxon 9.9 and Saxon 10. One of Oxygen XML Editor users is trying to sort German and French terms according to locale sort rules (de-DE, de-AT, fr-FR, fr-CA), but it seems that only responds to main/basic language rules (de-AT = de = de-DE and fr-CA = fr = fr-FR). In the attached input file you can see the desired result for de-AT and fr-CA in the comments.

sort.xsl (1.41 KB) sort.xsl Octavian Nadolu, 2021-05-17 14:11
sort.xml (981 Bytes) sort.xml Octavian Nadolu, 2021-05-17 14:11

History

#1 Updated by Michael Kay about 1 month ago

If the lang attribute is present on xsl:sort, and if no other relevant attributes (such as collation) are present, then Saxon will obtain a Java Locale using the logic in JavaCollationFactory.getLocale() (which splits the supplied lang value into language and country parts), and then gets a Java Collator using Collator.getInstance(locale).

The set of locales available depends on the JVM installation.

If you're concerned about accurate collation, then I'd advise setting the collation attribute to a UCA collation URI, and making sure the ICU implementation is used (which means you need Saxon-PE or higher, and ICU must be on the classpath).

#2 Updated by Michael Kay about 1 month ago

I've confirmed that if you use a collation attribute rather than a lang attribute, for example

collation="http://www.w3.org/2013/collation/UCA?lang=de-DE"

then it now uses ICU collations rather than Java collations. This leads to a difference between fr-FR and fr-CA, though the results for Germany and Austria are still the same.

(I've always doubted whether the traditional differences noted by collation experts still exist in the 21st century - I think collation standards nowadays are much more likely to vary from one publisher to another, rather than from one country to another. Austrians surely read the same books that Germans do, and the indexes at the back of the book aren't going to be re-sorted for the Austrian market. But I'm happy to leave that question to the ICU experts).

Perhaps in the case where xsl:sort is used with a lang attribute and no collation attribute, Saxon-PE and -EE should now be using the ICU/UCA collation rather than the Java collation.

#3 Updated by Octavian Nadolu about 1 month ago

Thank you very much for analyzing this issue and for clarifying it.

Please register to edit this issue

Also available in: Atom PDF