Possible Bug in NumberFormatter

Added by Anonymous over 16 years ago

Legacy ID: #4819083 Legacy Poster: Scott A. Colcord (sacolcor)

I may have found a bug in the NumberFormatter class: In the Saxon source code, at <http://saxon.svn.sourceforge.net/viewvc/saxon/latest9.0/bj/net/sf/saxon/number/NumberFormatter.java?view=markup>, Saxon uses the Java API call Character.isLetterOrDigit() to determine if a character is a letter or digit. Looking at the Java API spec, that call forwards the number check to Character.isDigit() (at <http://java.sun.com/javase/6/docs/api/java/lang/Character.html#isDigit(char)>), which returns true iff Character.getType() reports the Unicode category type for the character as DECIMAL_DIGIT_NUMBER (category "Nd"). However, the XSLT spec, at <http://www.w3.org/TR/xslt20/#number>, says "Alphanumeric means any character that has a Unicode category of Nd, Nl, No, Lu, Ll, Lt, Lm or Lo." As a result, any codepoint in the "Nl" or "No" categories is identified as a non-alphanumeric, and used as a prefix/suffix instead of as a Format Token. An example of this is U+2460 (CIRCLED DIGIT ONE). Arguably, the Java API should provide a simple call to test if a codepoint is a Number (Nd, Nl, or No), not just a Digit. However, lacking that, Saxon's NumberFormatter.isLetterOrDigit() code may need to call Character.getType() in order to correctly identify non-Digit Numbers as Format Tokens.

Replies (1)

RE: Possible Bug in NumberFormatter - Added by Anonymous over 16 years ago

Legacy ID: #4819838 Legacy Poster: Michael Kay (mhkay)

Yes, you're right. Thanks for bringing it to my attention. I shall fix this in the next release. Given that the code paths already differ from JDK 1.4 and 1.5, I decided the best fix was to bring the logic in-house and not rely on Java's classifications. Michael Kay

(1-1/1)

Please register to reply

Project

Profile

Help

Saxon