Project

Profile

Help

"numeric" parameter in UCA collation

Added by Martin Honnen almost 7 years ago

https://www.w3.org/TR/xpath-functions-31/#uca-collations says about the @numeric@ parameter:

When numeric=yes is specified, a sequence of consecutive digits is interpreted as a number, for example chap2 sorts before chap12

and it appears there is also the test case @compare-034@ in the XQuery test suite that does

compare("Chap2", "Chap10", "http://www.w3.org/2013/collation/UCA?lang=en;numeric=yes")

and is supposed to return @-1@ as @Chap2@ is meant to sort before @Chap10@.

However, when I run that query with Saxon 9.8.0.8 EE or EE from the command line I get the result @1@.

Some more complex tries to use the @numeric@ parameter also do not give me the result I would expect from "a sequence of consecutive digits is interpreted as a number", for instance when I use

sort(('chap12', 'chap2', 'chap101', 'chap20'), 'http://www.w3.org/2013/collation/UCA?lang=en;numeric=yes')

with Saxon I get @chap101 chap12 chap2 chap20@.

With Altova XMLSpy 2018 I get @chap2 chap12 chap20 chap101@ which is the order I would expect.


Replies (4)

Please register to reply

RE: "numeric" parameter in UCA collation - Added by Michael Kay almost 7 years ago

Yes, I too observed this when investigating

https://stackoverflow.com/questions/48823316/xslt-condition-and-decimal-values/48823982#48823982

The query works correctly when running Saxon-EE with a license file. In Saxon-PE and EE, we use the ICU library for handling collations. In HE mode, we fall back to what's available from the JDK (unless you specify fallback=no). And I don't think the JDK collation machinery has any way of implementing numeric=yes.

However, with some investigation, it appears to be possible to combine Saxon's AlphanumericCollator class (which is used for Saxon collation URIs) with a base RuleBasedCollator supplied by the JDK for sorting the alphabetic parts of the sort key, so I shall add this capability.

RE: "numeric" parameter in UCA collation - Added by Martin Honnen almost 7 years ago

I run Saxon 9.8 EE from the command line with the @-t@ option and it indicates me that it uses a license, nevertheless I get the results mentioned, namely @compare("Chap2", "Chap10", "http://www.w3.org/2013/collation/UCA?lang=en;numeric=yes")@ evaluating to @1@ and @sort(('chap12', 'chap2', 'chap101', 'chap20'), 'http://www.w3.org/2013/collation/UCA?lang=en;numeric=yes')@ to @chap101 chap12 chap2 chap20@.

Does the command line code path somehow not pick up the ICU library or why do I get the wrong results?

RE: "numeric" parameter in UCA collation - Added by Michael Kay almost 7 years ago

I'm somewhat mystified by this because I'm getting the incorrect answer when I run in the terminal:

Saxon-EE 9.8.0.7J from Saxonica
Java version 1.7.0_55
Using license serial number K006480
Analyzing query from {compare('Chap2', 'Chap10', 'http://www.w3.org/2013/collation/UCA?lang=en;numeric=yes')}
Analysis time: 351.96 milliseconds
1
Execution time: 25.566ms
Memory used: 44695432

and the correct answer when I run under IntelliJ:

Saxon-EE 9.8.0.9J from Saxonica
Java version 1.7.0_55
Using license serial number K006480
Analyzing query from {compare('Chap2', 'Chap10', 'http://www.w3.org/2013/collation/UCA?lang=en;numeric=yes')}
Analysis time: 490.972 milliseconds
Processing file:/Users/mike/Desktop/temp/test.xml
Source document ignored - query does not access the context item
-1
Execution time: 18.012ms

I know that's a different Saxon version but I've been experimenting with variations. I've also been playing with different Java versions.

A possibility which I will now investigate is that there's something materially different between the built JAR files and the version of the code that we're running in the IDE.

RE: "numeric" parameter in UCA collation - Added by Michael Kay almost 7 years ago

Simple and infuriating. It works correctly if you put icu4j-59_1.jar on the classpath.

    (1-4/4)

    Please register to reply