Project

Profile

Help

Bug #5753

closed

Localisation: numbering as words

Added by Michael Kay 2 months ago. Updated about 2 months ago.

Status:
Won't fix
Priority:
Low
Assignee:
Category:
-
Sprint/Milestone:
-
Start date:
2022-12-01
Due date:
% Done:

0%

Estimated time:
Legacy ID:
Applies to branch:
Fix Committed on Branch:
Fixed in Maintenance Release:
Platforms:

Description

What should be the output of

format-number(101, "w", "en")

and should it vary between en-GB and en-US?

At present, for both en-GB and en-US, we are returning the cardinal "one hundred and one" and the ordinal "one hundred and first". But a unit test that calls ICU directly, not via XPath, is outputting the cardinal "one hundred and one" and the ordinal "one hundred first" - again, for both locales, Which is correct, and why do they vary?

I think that what is happening here is that ICU offers two cardinal numbering schemes spellout-cardinal and spellout-cardinal-verbose, and similarly two ordinal schemes spellout-ordinal and spellout-ordinal-verbose; and we are choosing between them essentially at random. The verbose option includes and in the result, the non-verbose option omits it.

Actions #1

Updated by Michael Kay 2 months ago

  • Subject changed from One hundred and one (dalmatians) to One hundred [and] one (dalmatians)
  • Description updated (diff)
Actions #2

Updated by Michael Kay 2 months ago

Well, I think it's working as designed.

It's driven by a static variable in ICUNumbererPE:

preferences = {"-verbose", "", "-native", "-neuter", "-feminine", "-masculine"};

which has the effect that spellout-cardinal-verbose is preferred over spellout-cardinal, and that's why we get "one hundred and twenty" rather than "one hundred twenty". Any difference between UK and US usage doesn't enter into it. This is documented at

https://www.saxonica.com/documentation11/index.html#!localization/ICU-numbers-and-dates

The non-verbose form ("One hundred twenty") can be obtained using language code en-GB-x-so or en-US-x-so.

Actions #3

Updated by Michael Kay 2 months ago

Moving on through the failing Numberer tests, it seems that (with the Saxon default JavaLocalizerFactory), using lang="de" on format-integer doesn't pick up the German numberer; it has to be lang="de-DE".

Actions #4

Updated by Michael Kay 2 months ago

  • Subject changed from One hundred [and] one (dalmatians) to Localisation: numbering as words

Next issue: for format-number(1, "w", "de") Saxon's local numberer outputs eins whereas ICU outputs ein. Which is correct?

It depends on context. "Chapter One" would be "Kapitel Eins". But "One Chapter" would be "Ein Kapitel". Both are cardinal numbers, but they clearly have different grammatical roles and the localisation gurus don't seem to recognise this. In German, as far as I can tell, "one" is the only number where it makes a difference - until you get to 101, 201, etc. Simplest solution is to change the local numberer to match ICU.

Actions #5

Updated by Michael Kay about 2 months ago

  • Status changed from New to Won't fix

Decided not to pursue the task of making the local numberer classes deliver exactly the same results as the ICU classes. We'll probably drop the local numberer classes at some stage - though not quite yet.

I have implemented the suggestion in the W3C spec that it should be possible to refer to an ICU spellout name (a) in xsl:number/@ordinal (for example <xsl:number value="101" ordinal="%spellout-ordinal-masculine"/>), or (b) the parenthesised qualifier in the picture string of format-integer, for example format-integer(101, "w;o(%spellout-ordinal-masculine)", "en-GB").

Please register to edit this issue

Also available in: Atom PDF