Bug #5753
closed

Localisation: numbering as words
0%
Description
What should be the output of
format-number(101, "w", "en")
and should it vary between en-GB and en-US?
At present, for both en-GB and en-US, we are returning the cardinal "one hundred and one" and the ordinal "one hundred and first". But a unit test that calls ICU directly, not via XPath, is outputting the cardinal "one hundred and one" and the ordinal "one hundred first" - again, for both locales, Which is correct, and why do they vary?
I think that what is happening here is that ICU offers two cardinal numbering schemes spellout-cardinal
and spellout-cardinal-verbose
, and similarly two ordinal schemes spellout-ordinal
and spellout-ordinal-verbose
; and we are choosing between them essentially at random. The verbose
option includes and
in the result, the non-verbose option omits it.
Updated by Michael Kay about 1 year ago
- Subject changed from One hundred and one (dalmatians) to One hundred [and] one (dalmatians)
- Description updated (diff)
Updated by Michael Kay about 1 year ago
Well, I think it's working as designed.
It's driven by a static variable in ICUNumbererPE:
preferences = {"-verbose", "", "-native", "-neuter", "-feminine", "-masculine"};
which has the effect that spellout-cardinal-verbose
is preferred over spellout-cardinal
, and that's why we get "one hundred and twenty" rather than "one hundred twenty". Any difference between UK and US usage doesn't enter into it. This is documented at
https://www.saxonica.com/documentation11/index.html#!localization/ICU-numbers-and-dates
The non-verbose form ("One hundred twenty")
can be obtained using language code en-GB-x-so
or en-US-x-so
.
Updated by Michael Kay about 1 year ago
Moving on through the failing Numberer tests, it seems that (with the Saxon default JavaLocalizerFactory), using lang="de" on format-integer doesn't pick up the German numberer; it has to be lang="de-DE".
Updated by Michael Kay about 1 year ago
- Subject changed from One hundred [and] one (dalmatians) to Localisation: numbering as words
Next issue: for format-number(1, "w", "de") Saxon's local numberer outputs eins
whereas ICU outputs ein
. Which is correct?
It depends on context. "Chapter One" would be "Kapitel Eins". But "One Chapter" would be "Ein Kapitel". Both are cardinal numbers, but they clearly have different grammatical roles and the localisation gurus don't seem to recognise this. In German, as far as I can tell, "one" is the only number where it makes a difference - until you get to 101, 201, etc. Simplest solution is to change the local numberer to match ICU.
Updated by Michael Kay 12 months ago
- Status changed from New to Won't fix
Decided not to pursue the task of making the local numberer classes deliver exactly the same results as the ICU classes. We'll probably drop the local numberer classes at some stage - though not quite yet.
I have implemented the suggestion in the W3C spec that it should be possible to refer to an ICU spellout name (a) in xsl:number/@ordinal
(for example <xsl:number value="101" ordinal="%spellout-ordinal-masculine"/>
), or (b) the parenthesised qualifier in the picture string of format-integer, for example format-integer(101, "w;o(%spellout-ordinal-masculine)", "en-GB")
.
Please register to edit this issue