format-integer for Spanish
Added by Martin Honnen almost 3 years ago
I hope everyone is having a quite evening, as Christian started playing with format-integer
and languages I couldn't resist using it as well, I tried it for Spanish and find that the code
declare namespace output = "http://www.w3.org/2010/xslt-xquery-serialization";
declare option output:method 'xml';
declare option output:indent 'yes';
<test>
{
for $year in (1492, 1976, 1984)
return <format-integer number="{$year}">{format-integer($year, 'w', 'es')}</format-integer>
}
</test>
gives the output shown below
C:\Users\Martin Honnen\OneDrive\Documents\xslt\blog-xslt-3-by-example\format-integer>java -cp "C:\Program Files\Saxonica\SaxonEE11-1J\saxon-ee-11.1.jar" net.sf.saxon.Query -t format-integer-es-test1.xq
SaxonJ-EE 11.1 from Saxonica
Java version 1.8.0_272
Using license serial number xxxxx
Analyzing query from format-integer-es-test1.xq
Analysis time: 515.1488 milliseconds
<?xml version="1.0" encoding="UTF-8"?>
<test>
<format-integer number="1492">mil cuatrocientas noventa y dos</format-integer>
<format-integer number="1976">mil novecientas setenta y seis</format-integer>
<format-integer number="1984">mil novecientas ochenta y cuatro</format-integer>
</test>
, although when I save it to a file the editor shows a soft hypen between e.g. nove
and cientas
that copy/pasting here in the textarea swallows.
Aside from the soft hyphen, what astonishes me is the -as
in cientas
, the male form (and I think normal, regular form) would be cientos
and I can't just imagine that the feminism has invaded the ICU or Saxon far enough to make it cientas
instead of cientos
:).
So I tried running ICU4J but it seems to give e.g. novecientos
. Just out of curiosity, what makes Saxon EE 11.1 output cientas
?
format-integer-test-result-saxonee11.1J.xml (306 Bytes) format-integer-test-result-saxonee11.1J.xml | format-integer test with language es and SaxonJ 11.1 EE |
Replies (4)
Please register to reply
RE: format-integer for Spanish - Added by Michael Kay almost 3 years ago
We look at all the spellout options available for the chosen locale, and apply some heuristics to choose among them. Part of the algorithm is to apply a preference list, which reads
private static final String[] preferences = {"-verbose", "", "-native", "-neuter", "-feminine", "-masculine"};
and it's the order of this list that's causing us to prefer feminine over masculine.
I've no idea why it was written that way - have to consult the original author (John Lumley).
The localisation theory (in both F+O and ICU) recognizes that there are cardinal numbers and ordinal numbers, but it doesn't recognize that cardinal numbers can be used both adjectivally ("39 steps") and nominatively ("Step 39"). Dates (like 1984) are a nominative context - and have the additional complication that the spellout form is "nineteen eighty-four", not "one thousand nine hundred [and] eighty-four" (the [and] being present in en-GB but not en-US).
F+O only recognises that gender might be relevant for ordinal numbers, but ICU also allows gendered forms for cardinals. I've no idea whether, in a language like Spanish that offers both forms, they are relevant in both adjectival and nominative contexts.
I seem to recall an idea that you should be able to specify the ICU name for the numbering scheme as a modifier in the pattern, for example "wc($spellout-cardinal-masculine)"
. But I can't see any evidence of this being implemented in the code - perhaps it was just an idea.
RE: format-integer for Spanish - Added by Michael Kay almost 3 years ago
In fact there is a trick that I'd forgotten, you can specify o(2=dos) to select the numbering sequence in which 2 is represented by "dos", Unfortunately this isn't good enough for Spanish, where the gender distinction only kicks in for a few numbers like 1 and 31 and 200. Perhaps we should allow you to specify any value that can be used as a discriminant e.g. o(1=una).
RE: format-integer for Spanish - Added by Michael Kay almost 3 years ago
John Lumley has reminded me of another feature documented at
https://www.saxonica.com/documentation11/index.html#!localization/ICU-numbers-and-dates
By using language code es-x-scm
you should be able to get the numbering scheme spellout-cardinal-masculine. (scm being an an initialism for spellout-cardinal-masculine).
RE: format-integer for Spanish - Added by Martin Honnen almost 3 years ago
Yes, that works, !mil gracias!
Please register to reply