Bug #1842: Greek perispomeni and normalize-unicode - Saxon - Saxonica Developer Community

Actions

Send by e-mail Copy link

Bug #1842

closed

Greek perispomeni and normalize-unicode

Added by Michael Kay over 11 years ago. Updated over 11 years ago.

Status:

Closed

Priority:

Normal

Assignee:

Michael Kay

Category:

XPath conformance

Sprint/Milestone:

Start date:

2013-07-14

Due date:

% Done:

100%

Estimated time:

Legacy ID:

Applies to branch:

Fix Committed on Branch:

Fixed in Maintenance Release:

Platforms:

Description

Raised by Ryan Baumann on the SourceForge saxon-help list.

Various forms of characters with perispomeni seem to be handled

incorrectly with normalize-unicode (running as XSLT 2.0 in Saxon HE

9.5.1.1).

normalize-unicode('ῇ̓','NFC') (U+03B7 U+0342 U+0313 U+0345) is ῇ̓

(U+1FC6 U+0313 U+0345)

correct NFC: ῇ̓ (U+1FC7 U+0313)

normalize-unicode('ῇ̓','NFD') (U+1FC7 U+0313) is ῇ̓ (U+03B7 U+0342

U+0345 U+0313)

normalize-unicode('ῇ̓','NFD') (U+1FC6 U+0313 U+0345) is ῇ̓ (U+03B7

U+0342 U+0313 U+0345)

Other instances of incorrect NFC normalization (normalize-unicode on

these characters is idempotent):

ῇ̔ ‎(U+1FC6 U+0314 U+0345) should be ῇ̔ (U+1FC7 U+0314)

ῷ̔ (U+1FF6 U+0314 U+0345) should be ῷ̔ (U+1FF7 U+0314)

Ὧ (U+1F69 U+0342) should be Ὧ (U+1F6F)

Ἆ (U+1F08 U+0342) should be Ἆ (‎U+1F0E)

Checked against both Java's java.text.Normalizer and Perl's

Unicode::Normalize as my references for "correct" NFC normalization.

The problem seems to be fairly general for any character which has a

pre-combined perispomeni form. There are probably others than just

what's here, you can see the results of running java.text.Normalizer

against a large corpus of Ancient Greek that has already been passed

through normalize-unicode in this commit:

https://github.com/ryanfb/idp.data/commit/bcb7dd6223fb50c48f62027761e8deced2574ed7

-Ryan

Please register to edit this issue

Actions

Send by e-mail Copy link

Also available in: Atom PDF

Project

Profile

Help

Saxon

Bug #1842

Greek perispomeni and normalize-unicode

Updated by Michael Kay over 11 years ago

Updated by Michael Kay over 11 years ago

Updated by Michael Kay over 11 years ago

Updated by Michael Kay over 11 years ago

Updated by Michael Kay over 11 years ago

Updated by Michael Kay over 11 years ago

Updated by O'Neil Delpratt over 11 years ago