"numeric" parameter in UCA collation
Added by Martin Honnen almost 7 years ago
https://www.w3.org/TR/xpath-functions-31/#uca-collations says about the @numeric@ parameter:
When numeric=yes is specified, a sequence of consecutive digits is interpreted as a number, for example chap2 sorts before chap12
and it appears there is also the test case @compare-034@ in the XQuery test suite that does
compare("Chap2", "Chap10", "http://www.w3.org/2013/collation/UCA?lang=en;numeric=yes")
and is supposed to return @-1@ as @Chap2@ is meant to sort before @Chap10@.
However, when I run that query with Saxon 9.8.0.8 EE or EE from the command line I get the result @1@.
Some more complex tries to use the @numeric@ parameter also do not give me the result I would expect from "a sequence of consecutive digits is interpreted as a number", for instance when I use
sort(('chap12', 'chap2', 'chap101', 'chap20'), 'http://www.w3.org/2013/collation/UCA?lang=en;numeric=yes')
with Saxon I get @chap101 chap12 chap2 chap20@.
With Altova XMLSpy 2018 I get @chap2 chap12 chap20 chap101@ which is the order I would expect.
Replies (4)
Please register to reply
RE: "numeric" parameter in UCA collation - Added by Michael Kay almost 7 years ago
Yes, I too observed this when investigating
https://stackoverflow.com/questions/48823316/xslt-condition-and-decimal-values/48823982#48823982
The query works correctly when running Saxon-EE with a license file. In Saxon-PE and EE, we use the ICU library for handling collations. In HE mode, we fall back to what's available from the JDK (unless you specify fallback=no). And I don't think the JDK collation machinery has any way of implementing numeric=yes.
However, with some investigation, it appears to be possible to combine Saxon's AlphanumericCollator class (which is used for Saxon collation URIs) with a base RuleBasedCollator supplied by the JDK for sorting the alphabetic parts of the sort key, so I shall add this capability.
RE: "numeric" parameter in UCA collation - Added by Martin Honnen almost 7 years ago
I run Saxon 9.8 EE from the command line with the @-t@ option and it indicates me that it uses a license, nevertheless I get the results mentioned, namely @compare("Chap2", "Chap10", "http://www.w3.org/2013/collation/UCA?lang=en;numeric=yes")@ evaluating to @1@ and @sort(('chap12', 'chap2', 'chap101', 'chap20'), 'http://www.w3.org/2013/collation/UCA?lang=en;numeric=yes')@ to @chap101 chap12 chap2 chap20@.
Does the command line code path somehow not pick up the ICU library or why do I get the wrong results?
RE: "numeric" parameter in UCA collation - Added by Michael Kay almost 7 years ago
I'm somewhat mystified by this because I'm getting the incorrect answer when I run in the terminal:
Saxon-EE 9.8.0.7J from Saxonica Java version 1.7.0_55 Using license serial number K006480 Analyzing query from {compare('Chap2', 'Chap10', 'http://www.w3.org/2013/collation/UCA?lang=en;numeric=yes')} Analysis time: 351.96 milliseconds 1 Execution time: 25.566ms Memory used: 44695432
and the correct answer when I run under IntelliJ:
Saxon-EE 9.8.0.9J from Saxonica Java version 1.7.0_55 Using license serial number K006480 Analyzing query from {compare('Chap2', 'Chap10', 'http://www.w3.org/2013/collation/UCA?lang=en;numeric=yes')} Analysis time: 490.972 milliseconds Processing file:/Users/mike/Desktop/temp/test.xml Source document ignored - query does not access the context item -1 Execution time: 18.012ms
I know that's a different Saxon version but I've been experimenting with variations. I've also been playing with different Java versions.
A possibility which I will now investigate is that there's something materially different between the built JAR files and the version of the code that we're running in the IDE.
RE: "numeric" parameter in UCA collation - Added by Michael Kay almost 7 years ago
Simple and infuriating. It works correctly if you put icu4j-59_1.jar on the classpath.
Please register to reply