Project

Profile

Help

Missing alpha ranges, circled-decimal numbers

Added by Anonymous over 15 years ago

Legacy ID: #5519305 Legacy Poster: Scott A. Colcord (sacolcor)

In <http://saxon.svn.sourceforge.net/viewvc/saxon/latest9.1/bj/net/sf/saxon/number/Alphanumeric.java>, it looks like there are two ranges from UnicodeData.txt missing: CJK Ideograph (U+4E00...U+9FC3) Hangul Syllable (U+AC00...U+D7A3) Also, as a feature request, would it be possible to add support for circled-decimal numbers through 20 using codepoint U+2460? It's a common numbering style in many Asian countries, and should be fairly easy (just translate to those 20 sequential codepoints, and failover to decimal after that). Thanks! Scott Colcord (PTC/Arbortext)


Replies (2)

RE: Missing alpha ranges, circled-decimal numbers - Added by Anonymous over 15 years ago

Legacy ID: #5522834 Legacy Poster: Michael Kay (mhkay)

Interesting. The list actually includes the ranges 4E00 ... 4E00 9FBB ... 9FBB AC00 ... AC00 D7A3 ... D7A3 It looks as if my stylesheet wasn't good enough at analysing the convention used in the Unicode database for these ranges: 4E00;<CJK Ideograph, First>;Lo;0;L;;;;;N;;;;; 9FBB;<CJK Ideograph, Last>;Lo;0;L;;;;;N;;;;; AC00;<Hangul Syllable, First>;Lo;0;L;;;;;N;;;;; D7A3;<Hangul Syllable, Last>;Lo;0;L;;;;;N;;;;; This is from the 4.0 database, presumably 9FC3 comes from Unicode 4.1? Indeed, the stylesheet has a comment: Note this doesn't handle the CJK Extended Ideograph ranges A and B, 3400-4DB5 and 20000-2A6D6, which have to be edited in by hand As far as I can see, the two ranges that you identify are the only ones I missed in this process. I'll add these ranges to the development source and also by patch to 9.1. I'll add a TODO note to the code to regenerate the table from Unicode 4.1 some time in the future. I'll look at adding some additional ranges for the circled and parenthesized numbers for a future release.

RE: Missing alpha ranges, circled-decimal numbers - Added by Anonymous over 15 years ago

Legacy ID: #5524779 Legacy Poster: Scott A. Colcord (sacolcor)

> This is from the 4.0 database, presumably 9FC3 comes from Unicode 4.1? The XSLT spec didn't appear to specify a normative Unicode version, so I used the latest one in my checks.; it looks like it was changed from 9FBB in the 5.1 database: <http://www.unicode.org/Public/5.1.0/ucd/UnicodeData.txt>. > I'll add these ranges to the development source and also by patch to 9.1. > > I'll look at adding some additional ranges for the circled and parenthesized numbers for a future release. Thanks! ----Scott

    (1-2/2)

    Please register to reply